A Language Model for Grammatical Error Correction in L2 Russian

Nikita Remnev; Sergei Obiedkov; Ekaterina Rakhilina; Ivan Smirnov,; Anastasia Vyrenkova

arXiv:2307.01609·cs.CL·July 6, 2023·1 cites

A Language Model for Grammatical Error Correction in L2 Russian

Nikita Remnev, Sergei Obiedkov, Ekaterina Rakhilina, Ivan Smirnov,, Anastasia Vyrenkova

PDF

Open Access

TL;DR

This paper presents a language model designed to improve grammatical error correction in non-native Russian writing, trained on untagged texts and validated against a specialized error correction corpus.

Contribution

It introduces a novel language model pipeline specifically tailored for correcting L2 Russian errors, leveraging untagged corpus data.

Findings

01

Model achieves improved correction accuracy on L2 Russian texts

02

Validation shows effectiveness against the RULEC-GEC corpus

03

Demonstrates potential for enhancing NLP tools for non-native speakers

Abstract

Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, we propose a pipeline involving a language model intended for correcting errors in L2 Russian writing. The language model proposed is trained on untagged texts of the Newspaper subcorpus of the Russian National Corpus, and the quality of the model is validated against the RULEC-GEC corpus.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

Methodsfail