Accuracy of the Uzbek stop words detection: a case study on "School corpus"
Khabibulla Madatov, Shukurla Bekchanov, Jernej Vi\v{c}i\v{c}

TL;DR
This study evaluates the accuracy of automatically generated Uzbek stop words lists within the 'School corpus', highlighting the challenges and potential methods for assessing stop words in agglutinative languages like Uzbek.
Contribution
It introduces a method to evaluate stop words quality in Uzbek, demonstrating its applicability and analyzing the distribution of stop words in texts.
Findings
Stop words lists achieved acceptable accuracy.
The method can be adapted for similar agglutinative languages.
Numerical analysis helps identify key parts of sentences with stop words.
Abstract
Stop words are very important for information retrieval and text analysis investigation tasks of natural language processing. Current work presents a method to evaluate the quality of a list of stop words aimed at automatically creating techniques. Although the method proposed in this paper was tested on an automatically-generated list of stop words for the Uzbek language, it can be, with some modifications, applied to similar languages either from the same family or the ones that have an agglutinative nature. Since the Uzbek language belongs to the family of agglutinative languages, it can be explained that the automatic detection of stop words in the language is a more complex process than in inflected languages. Moreover, we integrated our previous work on stop words detection in the example of the "School corpus" by investigating how to automatically analyse the detection of stop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Spam and Phishing Detection
