ErAConD : Error Annotated Conversational Dialog Dataset for Grammatical Error Correction

Xun Yuan; Derek Pham; Sam Davidson; Zhou Yu

arXiv:2112.08466·cs.CL·August 27, 2025

ErAConD : Error Annotated Conversational Dialog Dataset for Grammatical Error Correction

Xun Yuan, Derek Pham, Sam Davidson, Zhou Yu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces ErAConD, a novel conversational GEC dataset from chatbot conversations, and demonstrates its effectiveness in improving GEC model precision in informal dialogue settings.

Contribution

The paper presents the first conversational GEC dataset with an annotation scheme and shows its utility in enhancing GEC model performance.

Findings

01

16-point increase in GEC model precision using the dataset

02

Error annotations ranked by impact on comprehensibility

03

Effective improvement in conversational GEC tasks

Abstract

Currently available grammatical error correction (GEC) datasets are compiled using well-formed written text, limiting the applicability of these datasets to other domains such as informal writing and dialog. In this paper, we present a novel parallel GEC dataset drawn from open-domain chatbot conversations; this dataset is, to our knowledge, the first GEC dataset targeted to a conversational setting. To demonstrate the utility of the dataset, we use our annotated data to fine-tune a state-of-the-art GEC model, resulting in a 16 point increase in model precision. This is of particular importance in a GEC model, as model precision is considered more important than recall in GEC tasks since false positives could lead to serious confusion in language learners. We also present a detailed annotation scheme which ranks errors by perceived impact on comprehensibility, making our dataset both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuanxun-yx/errant
noneOfficial

Datasets

xunyuan/eracond
dataset· 84 dl
84 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling