On the Importance of Word Boundaries in Character-level Neural Machine Translation
Duygu Ataman, Orhan Firat, Mattia A. Di Gangi, Marcello Federico and, Alexandra Birch

TL;DR
This paper introduces a hierarchical decoding approach for character-level neural machine translation that improves translation accuracy and efficiency by better capturing linguistic structures, outperforming traditional subword and character models.
Contribution
The paper proposes a hierarchical decoding architecture for character-level NMT that enhances translation quality and efficiency compared to existing models.
Findings
Hierarchical decoding achieves higher accuracy than subword models.
The model uses fewer parameters while maintaining performance.
It better captures long-distance dependencies in translation.
Abstract
Neural Machine Translation (NMT) models generally perform translation using a fixed-size lexical vocabulary, which is an important bottleneck on their generalization capability and overall translation quality. The standard approach to overcome this limitation is to segment words into subword units, typically using some external tools with arbitrary heuristics, resulting in vocabulary units not optimized for the translation task. Recent studies have shown that the same approach can be extended to perform NMT directly at the level of characters, which can deliver translation accuracy on-par with subword-based models, on the other hand, this requires relatively deeper networks. In this paper, we propose a more computationally-efficient solution for character-level NMT which implements a hierarchical decoding architecture where translations are subsequently generated at the level of words…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
