Stemming list of sentences words or phrases using NLTK
Stemming is a process of extracting a root word. For example, "jumping", "jumps" and "jumped" are stemmed into jump. Stemming helps us in standardizing words to their base stem regardless of their pronunciations, this helps us to classify or cluster the text. Search engines uses these techniques extensively to give better and more accurate results irrespective of the word form. The nltk package has several implementations for stemmers.
NLTK Stemming using Porter stemmer
from nltk.stem import PorterStemmer st = PorterStemmer() text = ['Where did he learn to dance like that?', 'His eyes were dancing with humor.', 'She shook her head and danced away', 'Alex was an excellent dancer.'] output = [] for sentence in text: output.append(" ".join([st.stem(i) for i in sentence.split()])) for item in output: print(item) print("-" * 50) print(st.stem('jumping'), st.stem('jumps'), st.stem('jumped'))
where did he learn to danc like that? hi eye were danc with humor. she shook her head and danc away alex wa an excel dancer. -------------------------------------------------- jump jump jump
NLTK Stemming using Lancaster stemmer
from nltk.stem import LancasterStemmer st = LancasterStemmer() text = ['Where did he learn to dance like that?', 'His eyes were dancing with humor.', 'She shook her head and danced away', 'Alex was an excellent dancer.'] output = [] for sentence in text: output.append(" ".join([st.stem(i) for i in sentence.split()])) for item in output: print(item) print("-" * 50) print(st.stem('jumping'), st.stem('jumps'), st.stem('jumped'))
wher did he learn to dant lik that? his ey wer dant with humor. she shook her head and dant away alex was an excel dancer. -------------------------------------------------- jump jump jump
NLTK Stemming using Snowball stemmer
from nltk.stem import SnowballStemmer st = SnowballStemmer("english") text = ['Where did he learn to dance like that?', 'His eyes were dancing with humor.', 'She shook her head and danced away', 'Alex was an excellent dancer.'] output = [] for sentence in text: output.append(" ".join([st.stem(i) for i in sentence.split()])) for item in output: print(item) print("-" * 50) print(st.stem('jumping'), st.stem('jumps'), st.stem('jumped'))
where did he learn to danc like that? his eye were danc with humor. she shook her head and danc away alex was an excel dancer. -------------------------------------------------- jump jump jump
NLTK Stemming using RegexpStemmer
from nltk.stem import RegexpStemmer st = RegexpStemmer('ing$|s$|ed$|er$', min=4) text = ['Where did he learn to dance like that?', 'His eyes were dancing with humor.', 'She shook her head and danced away', 'Alex was an excellent dancer.'] output = [] for sentence in text: output.append(" ".join([st.stem(i) for i in sentence.split()])) for item in output: print(item) print("-" * 50) print(st.stem('jumping'), st.stem('jumps'), st.stem('jumped'))
Where did he learn to dance like that? His eye were danc with humor. She shook her head and danc away Alex was an excellent dancer. -------------------------------------------------- jump jump jump
You can see how the stemming results are different for each stemmers. You should choose your stemmer based on your problem. If needed, you can even build your own stemmer with your own defined rules.
2019-04-26T10:30:40+05:30
2019-04-26T10:30:40+05:30
Amit Arora
Amit Arora
Python Programming Tutorial
Python
Practical Solution