Character Transformations for Non-Autoregressive GEC Tagging

Milan Straka; Jakub N\'aplava; Jana Strakov\'a

arXiv:2111.09280·cs.CL·November 18, 2021

Character Transformations for Non-Autoregressive GEC Tagging

Milan Straka, Jakub N\'aplava, Jana Strakov\'a

PDF

Open Access 1 Repo

TL;DR

This paper introduces a character-based non-autoregressive grammatical error correction method that uses automatically generated character transformations, achieving faster results and better handling of complex language features.

Contribution

It presents a novel approach for generating character transformations from GEC data, improving efficiency and handling of morphologically rich languages.

Findings

01

Solid results on Czech, German, and Russian

02

Significant speedup over autoregressive systems

03

Addresses limitations of word replacement edits

Abstract

We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations. Recently, per-word classification of correction edits has proven an efficient, parallelizable alternative to current encoder-decoder GEC systems. We show that word replacement edits may be suboptimal and lead to explosion of rules for spelling, diacritization and errors in morphologically rich languages, and propose a method for generating character transformations from GEC corpus. Finally, we train character transformation models for Czech, German and Russian, reaching solid results and dramatic speedup compared to autoregressive systems. The source code is released at https://github.com/ufal/wnut2021_character_transformations_gec.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ufal/wnut2021_character_transformations_gec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification