Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds
Tassilo Klein, Moin Nabi

TL;DR
This paper introduces a novel neural question generation model combining GPT-2 and BERT architectures, improving question quality and diversity, and enhancing question answering performance, especially in semi-supervised and small-data scenarios.
Contribution
It proposes an end-to-end Transformer-based model that jointly generates questions and answers, leveraging the strengths of GPT-2 and BERT for improved semantic correctness and diversity.
Findings
Produces semantically correct and diverse questions on SQuAD 1.1
Improves downstream question answering performance
Effective in semi-supervised and small-data settings
Abstract
Automatic question generation aims at the generation of questions from a context, with the corresponding answers being sub-spans of the given passage. Whereas, most of the methods mostly rely on heuristic rules to generate questions, more recently also neural network approaches have been proposed. In this work, we propose a variant of the self-attention Transformer network architectures model to generate meaningful and diverse questions. To this end, we propose an easy to use model consisting of the conjunction of the Transformer decoder GPT-2 model with Transformer encoder BERT for the downstream task for question answering. The model is trained in an end-to-end fashion, where the language model is trained to produce a question-answer-aware input representation that facilitates to generate an answer focused question. Our result of neural question generation from text on the SQuAD 1.1…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Byte Pair Encoding
