Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces
Dayananda Ubrangala, Juhi Sharma, Ravi Prasad Kondapalli, Kiran R,, Amit Agarwala, Laurent Bou\'e

TL;DR
This paper introduces a data augmentation approach to improve typo correction in domain-specific search queries for online marketplaces, demonstrating effective real-time spelling correction with limited annotated data.
Contribution
It presents a novel data augmentation method and a domain-specific neural embedding approach for typo tolerant spell checking in online marketplace search.
Findings
Effective typo correction in real-time API for Microsoft AppSource
Synthetic data improves domain-specific spell checking accuracy
Approach reduces reliance on large annotated datasets
Abstract
Typographical errors are a major source of frustration for visitors of online marketplaces. Because of the domain-specific nature of these marketplaces and the very short queries users tend to search for, traditional spell cheking solutions do not perform well in correcting typos. We present a data augmentation method to address the lack of annotated typo data and train a recurrent neural network to learn context-limited domain-specific embeddings. Those embeddings are deployed in a real-time inferencing API for the Microsoft AppSource marketplace to find the closest match between a misspelled user query and the available product names. Our data efficient solution shows that controlled high quality synthetic data may be a powerful tool especially considering the current climate of large language models which rely on prohibitively huge and often uncontrolled datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
