Python ProgrammingPython Programming

Find frequency of each word from a text file using NLTK?

A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occurred as an outcome.

Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment.


Frequency of large words

import nltk
from nltk.corpus import webtext
from nltk.probability import FreqDist

nltk.download('webtext')
wt_words = webtext.words('testing.txt')
data_analysis = nltk.FreqDist(wt_words)

# Let's take the specific words only if their frequency is greater than 3.
filter_words = dict([(m, n) for m, n in data_analysis.items() if len(m) > 3])

for key in sorted(filter_words):
    print("%s: %s" % (key, filter_words[key]))

data_analysis = nltk.FreqDist(filter_words)

data_analysis.plot(25, cumulative=False)



C:\examples\nltk>python 6.py
[nltk_data] Downloading package webtext to
[nltk_data]     C:\Users\amit\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\webtext.zip.
1989: 1
Accessing: 1
Analysis: 1
Anyone: 1
Chapter: 1
Coding: 1
Data: 1
Dataset: 1
December: 1
Given: 1
Guido: 3
However: 2
Internet: 1
Loading: 1
Machine: 1
Many: 1
Performing: 1
Python: 4
Rossum: 1
Running: 1
Some: 1
They: 1
Underestimating: 1
Working: 1
ability: 2
access: 1
accessing: 1
achieve: 1
across: 1
afford: 1
algorithm: 3
algorithms: 1
also: 2
always: 3
analysis: 1
application: 3
applications: 6
area: 1
around: 1
available: 1
been: 1
best: 3
bestâ: 1
better: 2
boasts: 1
book: 3
both: 1
bound: 1
business: 1
..........
...........
...........
 
 

Find frequency of each word from a text file using NLTK