Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti
Mangsura Kabir Oni, Tabia Tanzin Prama

TL;DR
This paper explores Bengali-to-Sylheti translation using Transformer models, showing that fine-tuning specific models yields better results than zero-shot LLMs, emphasizing the need for task-specific adaptation for low-resource languages.
Contribution
It demonstrates the effectiveness of fine-tuning multilingual Transformer models over zero-shot LLMs for low-resource language translation, specifically Bengali to Sylheti.
Findings
Fine-tuned models outperform zero-shot LLMs in translation quality.
mBART-50 achieves the highest translation adequacy.
MarianMT shows the best character-level fidelity.
Abstract
Machine Translation (MT) has advanced from rule-based and statistical methods to neural approaches based on the Transformer architecture. While these methods have achieved impressive results for high-resource languages, low-resource varieties such as Sylheti remain underexplored. In this work, we investigate Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot large language models (LLMs). Experimental results demonstrate that fine-tuned models significantly outperform LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity. These findings highlight the importance of task-specific adaptation for underrepresented languages and contribute to ongoing efforts toward inclusive language technologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
