Retrieval Augmented Spelling Correction for E-Commerce Applications
Xuan Guo, Rohit Patki, Dante Everaert, Christopher Potts

TL;DR
This paper introduces a retrieval augmented generation approach for e-commerce spelling correction, effectively distinguishing new brand names from misspellings by integrating catalog retrieval with a fine-tuned language model.
Contribution
It presents a novel retrieval augmented spelling correction method that enhances accuracy by incorporating product catalog data into a language model's context.
Findings
Improved spelling correction accuracy with RAG framework
Retrieving product catalog data enhances model performance
Fine-tuning LLM with retrieved context yields better results
Abstract
The rapid introduction of new brand names into everyday language poses a unique challenge for e-commerce spelling correction services, which must distinguish genuine misspellings from novel brand names that use unconventional spelling. We seek to address this challenge via Retrieval Augmented Generation (RAG). On this approach, product names are retrieved from a catalog and incorporated into the context used by a large language model (LLM) that has been fine-tuned to do contextual spelling correction. Through quantitative evaluation and qualitative error analyses, we find improvements in spelling correction utilizing the RAG framework beyond a stand-alone LLM. We also demonstrate the value of additional finetuning of the LLM to incorporate retrieved context.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Data Mining and Analysis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Multi-Head Attention · Dense Connections · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Adam
