Generate the N-grams for the given sentence
The essential concepts in text mining is n-grams, which are a set of co-occurring or continuous sequence of n items from a sequence of large text or sentence. The item here could be words, letters, and syllables. 1-gram is also called as unigrams are the unique words present in the sentence. Bigram(2-gram) is the combination of 2 words. Trigram(3-gram) is 3 words and so on.
N-gram using NLTK
import nltk from nltk.util import ngrams # Function to generate n-grams from sentences. def extract_ngrams(data, num): n_grams = ngrams(nltk.word_tokenize(data), num) return [ ' '.join(grams) for grams in n_grams] data = 'A class is a blueprint for the object.' print("1-gram: ", extract_ngrams(data, 1)) print("2-gram: ", extract_ngrams(data, 2)) print("3-gram: ", extract_ngrams(data, 3)) print("4-gram: ", extract_ngrams(data, 4))
N-gram using TextBlob
from textblob import TextBlob # Function to generate n-grams from sentences. def extract_ngrams(data, num): n_grams = TextBlob(data).ngrams(num) return [ ' '.join(grams) for grams in n_grams] data = 'A class is a blueprint for the object.' print("1-gram: ", extract_ngrams(data, 1)) print("2-gram: ", extract_ngrams(data, 2)) print("3-gram: ", extract_ngrams(data, 3)) print("4-gram: ", extract_ngrams(data, 4))
1-gram: ['A', 'class', 'is', 'a', 'blueprint', 'for', 'the', 'object'] 2-gram: ['A class', 'class is', 'is a', 'a blueprint', 'blueprint for', 'for the', 'the object'] 3-gram: ['A class is', 'class is a', 'is a blueprint', 'a blueprint for', 'blueprint for the', 'for the object'] 4-gram: ['A class is a', 'class is a blueprint', 'is a blueprint for', 'a blueprint for the', 'blueprint for the object']
2019-05-03T08:21:05+05:30
2019-05-03T08:21:05+05:30
Amit Arora
Amit Arora
Python Programming Tutorial
Python
Practical Solution