Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
Juan-Manuel Torres-Moreno

TL;DR
This paper introduces Ultra-stemming, a novel normalization technique that reduces words to their initial letters, significantly improving automatic text summarization performance across multiple languages.
Contribution
It proposes Ultra-stemming, a new normalization method that enhances summarization by reducing dimensionality more effectively than traditional stemming or lemmatization.
Findings
Ultra-stemming preserves summary content effectively.
Performance improvements observed across multiple summarization systems.
Results confirmed on trilingual corpora using Fresa evaluation.
Abstract
In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
