Character Transformations for Non-Autoregressive GEC Tagging
Milan Straka, Jakub N\'aplava, Jana Strakov\'a

TL;DR
This paper introduces a character-based non-autoregressive grammatical error correction method that uses automatically generated character transformations, achieving faster results and better handling of complex language features.
Contribution
It presents a novel approach for generating character transformations from GEC data, improving efficiency and handling of morphologically rich languages.
Findings
Solid results on Czech, German, and Russian
Significant speedup over autoregressive systems
Addresses limitations of word replacement edits
Abstract
We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations. Recently, per-word classification of correction edits has proven an efficient, parallelizable alternative to current encoder-decoder GEC systems. We show that word replacement edits may be suboptimal and lead to explosion of rules for spelling, diacritization and errors in morphologically rich languages, and propose a method for generating character transformations from GEC corpus. Finally, we train character transformation models for Czech, German and Russian, reaching solid results and dramatic speedup compared to autoregressive systems. The source code is released at https://github.com/ufal/wnut2021_character_transformations_gec.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
