Dilated Convolutional Neural Networks for Lightweight Diacritics Restoration
B\'alint Csan\'ady, Andr\'as Luk\'acs

TL;DR
This paper introduces a lightweight 1D dilated convolutional neural network for diacritics restoration that performs competitively with larger models and can run locally in a web browser, especially effective for Hungarian language data.
Contribution
The paper presents a novel small-footprint 1D dilated CNN approach for diacritics restoration that is efficient, browser-compatible, and competitive with larger models.
Findings
Outperforms similarly sized models in diacritics restoration
Runs efficiently in a web browser environment
Shows good generalization across Hungarian corpora
Abstract
Diacritics restoration has become a ubiquitous task in the Latin-alphabet-based English-dominated Internet language environment. In this paper, we describe a small footprint 1D dilated convolution-based approach which operates on a character-level. We find that solutions based on 1D dilated convolutional neural networks are competitive alternatives to models based on recursive neural networks or linguistic modeling for the task of diacritics restoration. Our solution surpasses the performance of similarly sized models and is also competitive with larger models. A special feature of our solution is that it even runs locally in a web browser. We also provide a working example of this browser-based implementation. Our model is evaluated on different corpora, with emphasis on the Hungarian language. We performed comparative measurements about the generalization power of the model in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
