Vernacular Search Query Translation with Unsupervised Domain Adaptation
Mandar Kulkarni, Nikesh Garera

TL;DR
This paper introduces an unsupervised domain adaptation method for translating vernacular search queries across languages, significantly improving translation accuracy without requiring parallel corpora, especially for Hindi-English queries.
Contribution
It presents a novel unsupervised approach to adapt open-domain translation models to specific search queries, enhancing cross-lingual retrieval in e-commerce without parallel data.
Findings
Over 20 BLEU points improvement over baseline without parallel corpus
Fine-tuning with small labeled set yields over 27 BLEU points gain
Effective cross-lingual query translation for vernacular language queries
Abstract
With the democratization of e-commerce platforms, an increasingly diversified user base is opting to shop online. To provide a comfortable and reliable shopping experience, it's important to enable users to interact with the platform in the language of their choice. An accurate query translation is essential for Cross-Lingual Information Retrieval (CLIR) with vernacular queries. Due to internet-scale operations, e-commerce platforms get millions of search queries every day. However, creating a parallel training set to train an in-domain translation model is cumbersome. This paper proposes an unsupervised domain adaptation approach to translate search queries without using any parallel corpus. We use an open-domain translation model (trained on public corpus) and adapt it to the query data using only the monolingual queries from two languages. In addition, fine-tuning with a small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Image and Video Retrieval Techniques · Web Data Mining and Analysis
MethodsBalanced Selection
