Leveraging Sentence-oriented Augmentation and Transformer-Based Architecture for Vietnamese-Bahnaric Translation
Tan Sang Nguyen, Quoc Nguyen Pham, Tho Quan

TL;DR
This paper introduces sentence-oriented augmentation and transformer-based models to improve Vietnamese-Bahnaric translation, addressing resource limitations and enhancing language preservation through accessible AI-driven translation methods.
Contribution
It proposes flexible augmentation strategies compatible with various NMT models that do not need complex preprocessing or extra data, advancing low-resource language translation.
Findings
Improved translation accuracy with augmentation techniques
Methods are adaptable to different neural translation models
No additional data or complex preprocessing required
Abstract
The Bahnar people, an ethnic minority in Vietnam with a rich ancestral heritage, possess a language of immense cultural and historical significance. The government places a strong emphasis on preserving and promoting the Bahnaric language by making it accessible online and encouraging communication across generations. Recent advancements in artificial intelligence, such as Neural Machine Translation (NMT), have brought about a transformation in translation by improving accuracy and fluency. This, in turn, contributes to the revival of the language through educational efforts, communication, and documentation. Specifically, NMT is pivotal in enhancing accessibility for Bahnaric speakers, making information and content more readily available. Nevertheless, the translation of Vietnamese into Bahnaric faces practical challenges due to resource constraints, especially given the limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
