MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal
Youngjoon Jang, Seongtae Hong, Hyeonseok Moon, Heuiseok Lim

TL;DR
MLAIRE is a new protocol for evaluating multilingual information retrieval that separately measures semantic accuracy and language preference, addressing limitations of existing language-agnostic metrics.
Contribution
It introduces controlled pools with parallel passages and novel language-aware metrics to disentangle semantic retrieval from language preference in evaluation.
Findings
Standard metrics obscure differences between semantic accuracy and language preference.
Evaluated 31 retrievers showing diverse behaviors in semantic relevance and language preference.
Semantic strength does not always correlate with language preference in retrieval performance.
Abstract
Multilingual Information Retrieval is increasingly important in real-world search settings, where users issue queries over mixed-language corpora. Existing evaluations mainly reward language-agnostic semantic relevance, treating relevant passages equally regardless of language. Yet retrieval utility also depends on the language of the retrieved passages: users may prefer results they can read and verify in the query language, and query--passage language mismatch can complicate downstream grounding and answer verification in Retrieval-Augmented Generation systems. To evaluate this language-aware dimension, we introduce MLAIRE, a Multilingual Language-Aware Information Retrieval Evaluation protocol that disentangles cross-lingual semantic retrieval from query-language preference. MLAIRE constructs controlled pools with parallel passages across languages, enabling measurement of semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
