Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space
Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Jiancheng Lv,, Nan Duan, Ming Zhou

TL;DR
This paper introduces CRQDA, a novel data augmentation technique for question datasets that uses a continuous space rewriting approach with a Transformer autoencoder and gradient optimization, improving performance in MRC and QA tasks.
Contribution
The paper presents a new question data augmentation method that generates diverse, high-quality questions through controllable continuous space rewriting using a Transformer autoencoder.
Findings
CRQDA improves question quality and diversity in datasets.
Enhanced model performance on SQuAD and QNLI tasks.
Effective question augmentation with controllable rewriting.
Abstract
In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks. We treat the question data augmentation task as a constrained question rewriting problem to generate context-relevant, high-quality, and diverse question data samples. CRQDA utilizes a Transformer autoencoder to map the original discrete question into a continuous embedding space. It then uses a pre-trained MRC model to revise the question representation iteratively with gradient-based optimization. Finally, the revised question representations are mapped back into the discrete space, which serve as additional question data. Comprehensive experiments on SQuAD 2.0, SQuAD 1.1 question generation, and QNLI tasks demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Layer Normalization · Byte Pair Encoding · Multi-Head Attention · Dropout · Label Smoothing · Attention Is All You Need
