Python - Text Processing - Tutorialspoint on Python Frequency Distribution

counting the frequency of occurrence of a word in a body of text is often needed during text processing. this can be achieved by applying the word_tokenize() function and appending the result to a list to keep count of the words as shown in the below program.

from nltk.tokenize import word_tokenize
from nltk.corpus import gutenberg

sample = gutenberg.raw("blake-poems.txt")

token = word_tokenize(sample)
wlist = []

for i in range(50):
    wlist.append(token[i])

wordfreq = [wlist.count(w) for w in wlist]
print("pairs\n" + str(zip(token, wordfreq)))

when we run the above program, we get the following output −

[([', 1), (poems', 1), (by', 1), (william', 1), (blake', 1), (1789', 1), (]', 1), (songs', 2), (of', 3), (innocence', 2), (and', 1), (of', 3), (experience', 1), (and', 1), (the', 1), (book', 1), (of', 2), (thel', 1), (songs', 2), (of', 3), (innocence', 2), (introduction', 1), (piping', 2), (down', 1), (the', 1), (valleys', 1), (wild', 1), (,', 3), (piping', 2), (songs', 1), (of', 2), (pleasant', 1), (glee', 1), (,', 3), (on', 1), (a', 2), (cloud', 1), (i', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (and', 1), (he', 1), (laughing', 1), (said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

conditional frequency distribution

conditional frequency distribution is used when we want to count words meeting specific crteria satisfying a set of text.

import nltk
#from nltk.tokenize import word_tokenize
from nltk.corpus import brown

cfd = nltk.conditionalfreqdist(
          (genre, word)
          for genre in brown.categories()
          for word in brown.words(categories=genre))
categories = ['hobbies', 'romance','humor']
searchwords = [ 'may', 'might', 'must', 'will']
cfd.tabulate(conditions=categories, samples=searchwords)

when we run the above program, we get the following output −

          may might  must  will 
hobbies   131    22    83   264 
romance    11    51    45    43 
  humor     8     8     9    13 
Completed Course