Romanian Diacritics Restoration Using Recurrent Neural Networks

Stefan Ruseti; Teodor-Mihai Cotet; Mihai Dascalu

arXiv:2009.02743·cs.CL·September 8, 2020

Romanian Diacritics Restoration Using Recurrent Neural Networks

Stefan Ruseti, Teodor-Mihai Cotet, Mihai Dascalu

PDF

Open Access

TL;DR

This paper introduces a neural network model based on recurrent neural networks specifically designed to restore diacritics in Romanian texts, addressing the language-specific challenges with a novel architecture.

Contribution

The paper presents a new neural architecture tailored for Romanian diacritics restoration, optimizing performance for this language's unique characteristics.

Findings

01

Effective diacritics restoration demonstrated on Romanian texts

02

Outperforms previous non-neural methods

03

Highlights importance of language-specific neural models

Abstract

Diacritics restoration is a mandatory step for adequately processing Romanian texts, and not a trivial one, as you generally need context in order to properly restore a character. Most previous methods which were experimented for Romanian restoration of diacritics do not use neural networks. Among those that do, there are no solutions specifically optimized for this particular language (i.e., they were generally designed to work on many different languages). Therefore we propose a novel neural architecture based on recurrent neural networks that can attend information at different levels of abstractions in order to restore diacritics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling