Identifying Well-formed Natural Language Questions
Manaal Faruqui, Dipanjan Das

TL;DR
This paper introduces a new task of identifying well-formed natural language questions, providing a dataset and demonstrating how this can improve question generation models.
Contribution
The paper presents a new dataset of questions, a classifier for identifying well-formed questions, and shows its utility in enhancing question generation models.
Findings
Achieved 70.7% accuracy in classifying questions.
Classifier improves neural question generation performance.
Dataset of 25,100 questions released for research.
Abstract
Understanding search queries is a hard problem as it involves dealing with "word salad" text ubiquitously issued by users. However, if a query resembles a well-formed question, a natural language processing pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-wellformed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence models for generating questions for reading comprehension.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
