Towards Zero-Shot Multilingual Synthetic Question and Answer Generation   for Cross-Lingual Reading Comprehension

Siamak Shakeri; Noah Constant; Mihir Sanjay Kale; Linting Xue

arXiv:2010.12008·cs.CL·June 1, 2021

Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension

Siamak Shakeri, Noah Constant, Mihir Sanjay Kale, Linting Xue

PDF

Open Access

TL;DR

This paper introduces a multilingual question and answer generation method using a single generative model trained on English data, significantly improving zero-shot cross-lingual QA performance without requiring labeled data in target languages.

Contribution

It presents a multi-task training approach for a generative model that creates synthetic multilingual QA pairs from English data, enabling broader language coverage.

Findings

01

Achieves large gains on the XQuAD dataset

02

Reduces gap between zero-shot and supervised QA performance

03

Synthetic samples are mostly grammatically correct and sensible

Abstract

We propose a simple method to generate multilingual question and answer pairs on a large scale through the use of a single generative model. These synthetic samples can be used to improve the zero-shot performance of multilingual QA models on target languages. Our proposed multi-task training of the generative model only requires the labeled training samples in English, thus removing the need for such samples in the target languages, making it applicable to far more languages than those with labeled data. Human evaluations indicate the majority of such samples are grammatically correct and sensible. Experimental results show our proposed approach can achieve large gains on the XQuAD dataset, reducing the gap between zero-shot and supervised performance of smaller QA models on various languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems