MiLQ: Benchmarking IR Models for Bilingual Web Search with Mixed Language Queries
Jonghwi Kim, Deokhyung Kang, Seonjeong Hwang, Yunsu Kim, Jungseul Ok, Gary Lee

TL;DR
This paper introduces MiLQ, a benchmark dataset for evaluating IR models on mixed-language web search queries, revealing moderate model performance and highlighting the benefits of code-switching training data.
Contribution
It provides the first public benchmark for mixed-language queries and analyzes the effectiveness of multilingual IR models and code-switching strategies.
Findings
Multilingual IR models perform moderately on MiLQ.
Code-switched training data improves IR robustness.
English mixing enhances token matching in bilingual queries.
Abstract
Despite bilingual speakers frequently using mixed-language queries in web searches, Information Retrieval (IR) research on them remains scarce. To address this, we introduce MiLQ, Mixed-Language Query test set, the first public benchmark of mixed-language queries, qualified as realistic and relatively preferred. Experiments show that multilingual IR models perform moderately on MiLQ and inconsistently across native, English, and mixed-language queries, also suggesting code-switched training data's potential for robust IR models handling such queries. Meanwhile, intentional English mixing in queries proves an effective strategy for bilinguals searching English documents, which our analysis attributes to enhanced token matching compared to native queries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsInformation Retrieval and Search Behavior · Natural Language Processing Techniques · Web Data Mining and Analysis
