TL;DR
This paper introduces XLQA, a benchmark for evaluating multilingual open-domain question answering systems with a focus on locale-sensitive questions, revealing significant gaps in current models' cultural and regional understanding.
Contribution
The paper presents XLQA, a new benchmark with locale-sensitive questions and a systematic evaluation framework for multilingual ODQA, addressing cultural and regional variations often overlooked.
Findings
State-of-the-art models struggle with locale-sensitive questions.
Disparities in training data affect locale-awareness in models.
Benchmark reveals gaps between English and other languages.
Abstract
Large Language Models (LLMs) have shown significant progress in Open-domain question answering (ODQA), yet most evaluations focus on English and assume locale-invariant answers across languages. This assumption neglects the cultural and regional variations that affect question understanding and answer, leading to biased evaluation in multilingual benchmarks. To address these limitations, we introduce XLQA, a novel benchmark explicitly designed for locale-sensitive multilingual ODQA. XLQA contains 3,000 English seed questions expanded to eight languages, with careful filtering for semantic consistency and human-verified annotations distinguishing locale-invariant and locale-sensitive cases. Our evaluation of five state-of-the-art multilingual LLMs reveals notable failures on locale-sensitive questions, exposing gaps between English and other languages due to a lack of locale-grounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
