MLQA: Evaluating Cross-lingual Extractive Question Answering
Patrick Lewis, Barlas O\u{g}uz, Ruty Rinott, Sebastian Riedel, Holger, Schwenk

TL;DR
MLQA is a multilingual benchmark dataset designed to evaluate cross-lingual extractive question answering systems across seven languages, highlighting current limitations in transfer learning performance.
Contribution
The paper introduces MLQA, a high-quality, multilingual evaluation benchmark with aligned QA instances across seven languages, facilitating research in cross-lingual QA.
Findings
Transfer results lag behind training-language performance.
MLQA enables evaluation of cross-lingual models.
Provides a new resource for multilingual QA research.
Abstract
Question answering (QA) models have shown rapid progress enabled by the availability of large, high-quality benchmark datasets. Such annotated datasets are difficult and costly to collect, and rarely exist in languages other than English, making training QA systems in other languages challenging. An alternative to building large monolingual training datasets is to develop cross-lingual systems which can transfer to a target language without requiring training data in that language. In order to develop such systems, it is crucial to invest in high quality multilingual evaluation benchmarks to measure progress. We present MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area. MLQA contains QA instances in 7 languages, namely English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. It consists of over 12K QA instances in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
