# Cross-Lingual Training for Automatic Question Generation

**Authors:** Vishwajeet Kumar, Nitish Joshi, Arijit Mukherjee, Ganesh Ramakrishnan,, Preethi Jyothi

arXiv: 1906.02525 · 2019-06-07

## TL;DR

This paper introduces a cross-lingual question generation model that leverages large datasets in a secondary language to improve question generation in a primary language with limited data, demonstrated on Hindi and Chinese.

## Contribution

The paper proposes a novel cross-lingual training approach combining unsupervised pretraining and joint supervised training for QG in low-resource languages.

## Key findings

- Effective transfer of QG capabilities from secondary to primary language
- Improved question generation performance in Hindi and Chinese
- Created and released a new Hindi QG dataset

## Abstract

Automatic question generation (QG) is a challenging problem in natural language understanding. QG systems are typically built assuming access to a large number of training instances where each instance is a question and its corresponding answer. For a new language, such training instances are hard to obtain making the QG problem even more challenging. Using this as our motivation, we study the reuse of an available large QG dataset in a secondary language (e.g. English) to learn a QG model for a primary language (e.g. Hindi) of interest. For the primary language, we assume access to a large amount of monolingual text but only a small QG dataset. We propose a cross-lingual QG model which uses the following training regime: (i) Unsupervised pretraining of language models in both primary and secondary languages and (ii) joint supervised training for QG in both languages. We demonstrate the efficacy of our proposed approach using two different primary languages, Hindi and Chinese. We also create and release a new question answering dataset for Hindi consisting of 6555 sentences.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.02525/full.md

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/1906.02525/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1906.02525/full.md

---
Source: https://tomesphere.com/paper/1906.02525