Identifying Well-formed Natural Language Questions

Manaal Faruqui; Dipanjan Das

arXiv:1808.09419·cs.CL·August 29, 2018·5 cites

Identifying Well-formed Natural Language Questions

Manaal Faruqui, Dipanjan Das

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a new task of identifying well-formed natural language questions, providing a dataset and demonstrating how this can improve question generation models.

Contribution

The paper presents a new dataset of questions, a classifier for identifying well-formed questions, and shows its utility in enhancing question generation models.

Findings

01

Achieved 70.7% accuracy in classifying questions.

02

Classifier improves neural question generation performance.

03

Dataset of 25,100 questions released for research.

Abstract

Understanding search queries is a hard problem as it involves dealing with "word salad" text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research-datasets/query-wellformedness
none

Datasets

google-research-datasets/google_wellformed_query
dataset· 339 dl
339 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications