KSW: Khmer Stop Word based Dictionary for Keyword Extraction
Nimol Thuon, Wangrui Zhang, Sada Thuon

TL;DR
KSW introduces a Khmer-specific stop word dictionary and preprocessing method that significantly improves keyword extraction accuracy for Khmer language texts, addressing resource limitations and enhancing information retrieval.
Contribution
The paper develops a tailored Khmer stop word dictionary and preprocessing approach, advancing keyword extraction methods for low-resource languages.
Findings
Improved keyword extraction accuracy over previous methods
Effective stop word removal enhances relevance of extracted keywords
Resources are publicly available for further research
Abstract
This paper introduces KSW, a Khmer-specific approach to keyword extraction that leverages a specialized stop word dictionary. Due to the limited availability of natural language processing resources for the Khmer language, effective keyword extraction has been a significant challenge. KSW addresses this by developing a tailored stop word dictionary and implementing a preprocessing methodology to remove stop words, thereby enhancing the extraction of meaningful keywords. Our experiments demonstrate that KSW achieves substantial improvements in accuracy and relevance compared to previous methods, highlighting its potential to advance Khmer text processing and information retrieval. The KSW resources, including the stop word dictionary, are available at the following GitHub repository: (https://github.com/back-kh/KSWv2-Khmer-Stop-Word-based-Dictionary-for-Keyword-Extraction.git).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Handwritten Text Recognition Techniques · Mathematics, Computing, and Information Processing
