ErAConD : Error Annotated Conversational Dialog Dataset for Grammatical Error Correction
Xun Yuan, Derek Pham, Sam Davidson, Zhou Yu

TL;DR
This paper introduces ErAConD, a novel conversational GEC dataset from chatbot conversations, and demonstrates its effectiveness in improving GEC model precision in informal dialogue settings.
Contribution
The paper presents the first conversational GEC dataset with an annotation scheme and shows its utility in enhancing GEC model performance.
Findings
16-point increase in GEC model precision using the dataset
Error annotations ranked by impact on comprehensibility
Effective improvement in conversational GEC tasks
Abstract
Currently available grammatical error correction (GEC) datasets are compiled using well-formed written text, limiting the applicability of these datasets to other domains such as informal writing and dialog. In this paper, we present a novel parallel GEC dataset drawn from open-domain chatbot conversations; this dataset is, to our knowledge, the first GEC dataset targeted to a conversational setting. To demonstrate the utility of the dataset, we use our annotated data to fine-tune a state-of-the-art GEC model, resulting in a 16 point increase in model precision. This is of particular importance in a GEC model, as model precision is considered more important than recall in GEC tasks since false positives could lead to serious confusion in language learners. We also present a detailed annotation scheme which ranks errors by perceived impact on comprehensibility, making our dataset both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
