Leveraging Sentence-oriented Augmentation and Transformer-Based Architecture for Vietnamese-Bahnaric Translation

Tan Sang Nguyen; Quoc Nguyen Pham; Tho Quan

arXiv:2601.19124·cs.CL·January 28, 2026

Leveraging Sentence-oriented Augmentation and Transformer-Based Architecture for Vietnamese-Bahnaric Translation

Tan Sang Nguyen, Quoc Nguyen Pham, Tho Quan

PDF

Open Access

TL;DR

This paper introduces sentence-oriented augmentation and transformer-based models to improve Vietnamese-Bahnaric translation, addressing resource limitations and enhancing language preservation through accessible AI-driven translation methods.

Contribution

It proposes flexible augmentation strategies compatible with various NMT models that do not need complex preprocessing or extra data, advancing low-resource language translation.

Findings

01

Improved translation accuracy with augmentation techniques

02

Methods are adaptable to different neural translation models

03

No additional data or complex preprocessing required

Abstract

The Bahnar people, an ethnic minority in Vietnam with a rich ancestral heritage, possess a language of immense cultural and historical significance. The government places a strong emphasis on preserving and promoting the Bahnaric language by making it accessible online and encouraging communication across generations. Recent advancements in artificial intelligence, such as Neural Machine Translation (NMT), have brought about a transformation in translation by improving accuracy and fluency. This, in turn, contributes to the revival of the language through educational efforts, communication, and documentation. Specifically, NMT is pivotal in enhancing accessibility for Bahnaric speakers, making information and content more readily available. Nevertheless, the translation of Vietnamese into Bahnaric faces practical challenges due to resource constraints, especially given the limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling