Khmer Word Search: Challenges, Solutions, and Semantic-Aware Search

Rina Buoy; Nguonly Taing; Sovisal Chenda

arXiv:2112.08918·cs.CL·December 17, 2021·1 cites

Khmer Word Search: Challenges, Solutions, and Semantic-Aware Search

Rina Buoy, Nguonly Taing, Sovisal Chenda

PDF

Open Access

TL;DR

This paper addresses the unique challenges of Khmer word search by proposing normalization, spellcheckers, and a semantic model trained on a large corpus to improve search accuracy and semantic understanding.

Contribution

It introduces a comprehensive set of solutions including normalization, spellchecking, and a semantic model specifically tailored for Khmer language search.

Findings

01

Improved Khmer search accuracy through normalization techniques

02

Effective spellcheckers for grapheme and phoneme errors

03

Semantic model captures meaningful word similarities

Abstract

Search is one of the key functionalities in digital platforms and applications such as an electronic dictionary, a search engine, and an e-commerce platform. While the search function in some languages is trivial, Khmer word search is challenging given its complex writing system. Multiple orders of characters and different spelling realizations of words impose a constraint on Khmer word search functionality. Additionally, spelling mistakes are common since robust spellcheckers are not commonly available across the input device platforms. These challenges hinder the use of Khmer language in search-embedded applications. Moreover, due to the absence of WordNet-like lexical databases for Khmer language, it is impossible to establish semantic relation between words, enabling semantic search. In this paper, we propose a set of robust solutions to the above challenges associated with Khmer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies