Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm
B. P. Pande, Pawan Tamta, H. S. Dhami

TL;DR
This paper introduces a novel N-gram based stemming algorithm that is language-independent and compares favorably with Porter's Stemmer, demonstrating comparable performance in stemming accuracy.
Contribution
The paper presents a new N-gram stemming technique that improves upon existing methods by addressing initial character issues and is validated against Porter's Stemmer.
Findings
N-gram stemmer performs comparably to Porter's Stemmer
The method is language-independent
Results show no significant performance difference
Abstract
A language independent stemmer has always been looked for. Single N-gram tokenization technique works well, however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N gram stemming one step ahead and compare our method with an established algorithm in the field, Porter's Stemmer. Results indicate that our N gram stemmer is not inferior to Porter's linguistic stemmer.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
