XOR QA: Cross-lingual Open-Retrieval Question Answering
Akari Asai, Jungo Kasai, Jonathan H. Clark, Kenton Lee, Eunsol Choi, and Hannaneh Hajishirzi

TL;DR
This paper introduces XOR QA, a new cross-lingual open-retrieval question answering task and dataset, enabling questions in one language to be answered using content from another, addressing language resource disparities.
Contribution
It presents the first large-scale dataset and three new tasks for cross-lingual document retrieval in question answering, along with baseline evaluations.
Findings
XOR QA is a challenging task for current models.
State-of-the-art translation and multilingual models provide baseline performance.
The dataset enables research on cross-lingual information retrieval and QA.
Abstract
Multilingual question answering tasks typically assume answers exist in the same language as the question. Yet in practice, many languages face both information scarcity -- where languages have few reference articles -- and information asymmetry -- where questions reference concepts from other cultures. This work extends open-retrieval question answering to a cross-lingual setting enabling questions from one language to be answered via answer content from another language. We construct a large-scale dataset built on questions from TyDi QA lacking same-language answers. Our task formulation, called Cross-lingual Open Retrieval Question Answering (XOR QA), includes 40k information-seeking questions from across 7 diverse non-English languages. Based on this dataset, we introduce three new tasks that involve cross-lingual document retrieval using multi-lingual and English resources. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
