Towards Cultural Bridge by Bahnaric-Vietnamese Translation Using Transfer Learning of Sequence-To-Sequence Pre-training Language Model
Phan Tran Minh Dat, Vo Hoang Nhat Khang, Quan Thanh Tho

TL;DR
This paper presents a transfer learning approach using a sequence-to-sequence pre-trained language model to improve Bahnaric-Vietnamese translation, addressing resource scarcity and enhancing cultural understanding.
Contribution
It introduces a novel transfer learning method with data augmentation for low-resource Bahnaric-Vietnamese translation using a sequence-to-sequence model.
Findings
Effective translation performance demonstrated
Enhanced dataset improves translation accuracy
Facilitates cultural preservation and mutual understanding
Abstract
This work explores the journey towards achieving Bahnaric-Vietnamese translation for the sake of culturally bridging the two ethnic groups in Vietnam. However, translating from Bahnaric to Vietnamese also encounters some difficulties. The most prominent challenge is the lack of available original Bahnaric resources source language, including vocabulary, grammar, dialogue patterns and bilingual corpus, which hinders the data collection process for training. To address this, we leverage a transfer learning approach using sequence-to-sequence pre-training language model. First of all, we leverage a pre-trained Vietnamese language model to capture the characteristics of this language. Especially, to further serve the purpose of machine translation, we aim for a sequence-to-sequence model, not encoder-only like BERT or decoder-only like GPT. Taking advantage of significant similarity between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Softmax · Cosine Annealing · Attention Dropout · WordPiece · Residual Connection · Linear Layer · Byte Pair Encoding
