Khmer Word Search: Challenges, Solutions, and Semantic-Aware Search
Rina Buoy, Nguonly Taing, Sovisal Chenda

TL;DR
This paper addresses the unique challenges of Khmer word search by proposing normalization, spellcheckers, and a semantic model trained on a large corpus to improve search accuracy and semantic understanding.
Contribution
It introduces a comprehensive set of solutions including normalization, spellchecking, and a semantic model specifically tailored for Khmer language search.
Findings
Improved Khmer search accuracy through normalization techniques
Effective spellcheckers for grapheme and phoneme errors
Semantic model captures meaningful word similarities
Abstract
Search is one of the key functionalities in digital platforms and applications such as an electronic dictionary, a search engine, and an e-commerce platform. While the search function in some languages is trivial, Khmer word search is challenging given its complex writing system. Multiple orders of characters and different spelling realizations of words impose a constraint on Khmer word search functionality. Additionally, spelling mistakes are common since robust spellcheckers are not commonly available across the input device platforms. These challenges hinder the use of Khmer language in search-embedded applications. Moreover, due to the absence of WordNet-like lexical databases for Khmer language, it is impossible to establish semantic relation between words, enabling semantic search. In this paper, we propose a set of robust solutions to the above challenges associated with Khmer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
