Neural Grammatical Error Correction for Romanian

Teodor-Mihai Cotet; Stefan Ruseti; Mihai Dascalu

arXiv:2604.23627·cs.CL·April 28, 2026

Neural Grammatical Error Correction for Romanian

Teodor-Mihai Cotet, Stefan Ruseti, Mihai Dascalu

PDF

TL;DR

This paper introduces the first Romanian GEC corpus, adapts an evaluation toolkit, and demonstrates neural models with pretraining strategies that significantly improve error correction performance.

Contribution

It provides a new Romanian GEC dataset, adapts evaluation tools, and explores effective neural pretraining methods for low-resource language correction.

Findings

01

Baseline Transformer achieved F0.5 of 44.38

02

Pretraining on artificial data improved F0.5 to 53.76

03

Proposed data generation method is easily extensible to other languages

Abstract

Resources for Grammatical Error Correction (GEC) in non-English languages are scarce, while available spellcheckers in these languages are mostly limited to simple corrections and rules. In this paper we introduce a first GEC corpus for Romanian consisting of 10k pairs of sentences. In addition, the German version of ERRANT (ERRor ANnotation Toolkit) scorer was adapted for Romanian to analyze this corpus and extract edits needed for evaluation. Multiple neural models were experimented, together with pretraining strategies, which proved effective for GEC in low-resource settings. Our baseline consists of a small Transformer model trained only on the GEC dataset (F0.5 of 44.38), whereas the best performing model is produced by pretraining a larger Transformer model on artificially generated data, followed by finetuning on the actual corpus (F0.5 of 53.76). The proposed method for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.