FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Lvxiaowei Xu; Jianwang Wu; Jiawei Peng; Jiayu Fu; Ming Cai

arXiv:2210.12364·cs.CL·August 8, 2023

FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction

Lvxiaowei Xu, Jianwang Wu, Jiawei Peng, Jiayu Fu, Ming Cai

PDF

Open Access 2 Repos

TL;DR

This paper introduces FCGEC, a detailed Chinese grammatical error correction corpus, and proposes the STG model, which outperforms existing benchmarks but still lags behind human performance.

Contribution

The creation of a large, fine-grained Chinese GEC corpus and the development of the STG baseline model for low-resource correction tasks.

Findings

01

STG outperforms other benchmark models on FCGEC

02

Significant gap remains between models and human performance

03

FCGEC provides a valuable resource for Chinese GEC research

Abstract

Grammatical Error Correction (GEC) has been broadly applied in automatic correction and proofreading system recently. However, it is still immature in Chinese GEC due to limited high-quality data from native speakers in terms of category and scale. In this paper, we present FCGEC, a fine-grained corpus to detect, identify and correct the grammatical errors. FCGEC is a human-annotated corpus with multiple references, consisting of 41,340 sentences collected mainly from multi-choice questions in public school Chinese examinations. Furthermore, we propose a Switch-Tagger-Generator (STG) baseline model to correct the grammatical errors in low-resource settings. Compared to other GEC benchmark models, experimental results illustrate that STG outperforms them on our FCGEC. However, there exists a significant gap between benchmark models and humans that encourages future models to bridge it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling