Cross-Lingual Open-Domain Question Answering with Answer Sentence Generation
Benjamin Muller, Luca Soldaini, Rik Koncel-Kedziorski, Eric Lind,, Alessandro Moschitti

TL;DR
This paper extends open-domain generative question answering to multilingual and cross-lingual contexts, introducing a new dataset and a model that produces full-sentence answers across multiple languages, outperforming baselines.
Contribution
It introduces GenTyDiQA, a multilingual dataset, and develops a cross-lingual generative model that improves answer quality in multilingual QA tasks.
Findings
Outperforms answer sentence selection baselines in all five languages
Outperforms monolingual generative models in three languages
Demonstrates effectiveness of cross-lingual answer generation
Abstract
Open-Domain Generative Question Answering has achieved impressive performance in English by combining document-level retrieval with answer generation. These approaches, which we refer to as GenQA, can generate complete sentences, effectively answering both factoid and non-factoid questions. In this paper, we extend GenQA to the multilingual and cross-lingual settings. For this purpose, we first introduce GenTyDiQA, an extension of the TyDiQA dataset with well-formed and complete answers for Arabic, Bengali, English, Japanese, and Russian. Based on GenTyDiQA, we design a cross-lingual generative model that produces full-sentence answers by exploiting passages written in multiple languages, including languages different from the question. Our cross-lingual generative system outperforms answer sentence selection baselines for all 5 languages and monolingual generative pipelines for three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
