Generation, Implementation and Appraisal of an N-gram based Stemming   Algorithm

B. P. Pande; Pawan Tamta; H. S. Dhami

arXiv:1312.4824·cs.IR·January 17, 2014·5 cites

Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm

B. P. Pande, Pawan Tamta, H. S. Dhami

PDF

Open Access

TL;DR

This paper introduces a novel N-gram based stemming algorithm that is language-independent and compares favorably with Porter's Stemmer, demonstrating comparable performance in stemming accuracy.

Contribution

The paper presents a new N-gram stemming technique that improves upon existing methods by addressing initial character issues and is validated against Porter's Stemmer.

Findings

01

N-gram stemmer performs comparably to Porter's Stemmer

02

The method is language-independent

03

Results show no significant performance difference

Abstract

A language independent stemmer has always been looked for. Single N-gram tokenization technique works well, however, it often generates stems that start with intermediate characters, rather than initial ones. We present a novel technique that takes the concept of N gram stemming one step ahead and compare our method with an established algorithm in the field, Porter's Stemmer. Results indicate that our N gram stemmer is not inferior to Porter's linguistic stemmer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression