Romanian Diacritics Restoration Using Recurrent Neural Networks
Stefan Ruseti, Teodor-Mihai Cotet, Mihai Dascalu

TL;DR
This paper introduces a neural network model based on recurrent neural networks specifically designed to restore diacritics in Romanian texts, addressing the language-specific challenges with a novel architecture.
Contribution
The paper presents a new neural architecture tailored for Romanian diacritics restoration, optimizing performance for this language's unique characteristics.
Findings
Effective diacritics restoration demonstrated on Romanian texts
Outperforms previous non-neural methods
Highlights importance of language-specific neural models
Abstract
Diacritics restoration is a mandatory step for adequately processing Romanian texts, and not a trivial one, as you generally need context in order to properly restore a character. Most previous methods which were experimented for Romanian restoration of diacritics do not use neural networks. Among those that do, there are no solutions specifically optimized for this particular language (i.e., they were generally designed to work on many different languages). Therefore we propose a novel neural architecture based on recurrent neural networks that can attend information at different levels of abstractions in order to restore diacritics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
