How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for   Rewriting Ill-Formed Questions

Zewei Chu; Mingda Chen; Jing Chen; Miaosen Wang; Kevin Gimpel; Manaal; Faruqui; Xiance Si

arXiv:1911.09247·cs.CL·November 22, 2019·1 cites

How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions

Zewei Chu, Mingda Chen, Jing Chen, Miaosen Wang, Kevin Gimpel, Manaal, Faruqui, Xiance Si

PDF

Open Access 1 Repo

TL;DR

This paper introduces a large-scale, multi-domain dataset for rewriting ill-formed questions into well-formed ones, demonstrating improved neural model performance and providing resources for future research.

Contribution

The creation of the first large-scale, multi-domain dataset for question rewriting, with human annotations and baseline neural models showing significant improvements.

Findings

01

Question quality improves by 45 points after rewriting.

02

Neural models achieve 13.2% BLEU-4 improvement over baselines.

03

Dataset covers 303 domains with 427,719 question pairs.

Abstract

We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting MQR dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2% in BLEU-4 over baseline methods built from other data resources. We release the MQR dataset to encourage research on the problem of question rewriting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZeweiChu/MQR
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications