A Language Model for Grammatical Error Correction in L2 Russian
Nikita Remnev, Sergei Obiedkov, Ekaterina Rakhilina, Ivan Smirnov,, Anastasia Vyrenkova

TL;DR
This paper presents a language model designed to improve grammatical error correction in non-native Russian writing, trained on untagged texts and validated against a specialized error correction corpus.
Contribution
It introduces a novel language model pipeline specifically tailored for correcting L2 Russian errors, leveraging untagged corpus data.
Findings
Model achieves improved correction accuracy on L2 Russian texts
Validation shows effectiveness against the RULEC-GEC corpus
Demonstrates potential for enhancing NLP tools for non-native speakers
Abstract
Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, we propose a pipeline involving a language model intended for correcting errors in L2 Russian writing. The language model proposed is trained on untagged texts of the Newspaper subcorpus of the Russian National Corpus, and the quality of the model is validated against the RULEC-GEC corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
Methodsfail
