Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and   BERT Worlds

Tassilo Klein; Moin Nabi

arXiv:1911.02365·cs.CL·November 7, 2019·52 cites

Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds

Tassilo Klein, Moin Nabi

PDF

Open Access

TL;DR

This paper introduces a novel neural question generation model combining GPT-2 and BERT architectures, improving question quality and diversity, and enhancing question answering performance, especially in semi-supervised and small-data scenarios.

Contribution

It proposes an end-to-end Transformer-based model that jointly generates questions and answers, leveraging the strengths of GPT-2 and BERT for improved semantic correctness and diversity.

Findings

01

Produces semantically correct and diverse questions on SQuAD 1.1

02

Improves downstream question answering performance

03

Effective in semi-supervised and small-data settings

Abstract

Automatic question generation aims at the generation of questions from a context, with the corresponding answers being sub-spans of the given passage. Whereas, most of the methods mostly rely on heuristic rules to generate questions, more recently also neural network approaches have been proposed. In this work, we propose a variant of the self-attention Transformer network architectures model to generate meaningful and diverse questions. To this end, we propose an easy to use model consisting of the conjunction of the Transformer decoder GPT-2 model with Transformer encoder BERT for the downstream task for question answering. The model is trained in an end-to-end fashion, where the language model is trained to produce a question-answer-aware input representation that facilitates to generate an answer focused question. Our result of neural question generation from text on the SQuAD 1.1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Byte Pair Encoding