Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

Mangsura Kabir Oni; Tabia Tanzin Prama

arXiv:2510.18898·cs.CL·October 23, 2025

Transformer-Based Low-Resource Language Translation: A Study on Standard Bengali to Sylheti

Mangsura Kabir Oni, Tabia Tanzin Prama

PDF

Open Access

TL;DR

This paper explores Bengali-to-Sylheti translation using Transformer models, showing that fine-tuning specific models yields better results than zero-shot LLMs, emphasizing the need for task-specific adaptation for low-resource languages.

Contribution

It demonstrates the effectiveness of fine-tuning multilingual Transformer models over zero-shot LLMs for low-resource language translation, specifically Bengali to Sylheti.

Findings

01

Fine-tuned models outperform zero-shot LLMs in translation quality.

02

mBART-50 achieves the highest translation adequacy.

03

MarianMT shows the best character-level fidelity.

Abstract

Machine Translation (MT) has advanced from rule-based and statistical methods to neural approaches based on the Transformer architecture. While these methods have achieved impressive results for high-resource languages, low-resource varieties such as Sylheti remain underexplored. In this work, we investigate Bengali-to-Sylheti translation by fine-tuning multilingual Transformer models and comparing them with zero-shot large language models (LLMs). Experimental results demonstrate that fine-tuned models significantly outperform LLMs, with mBART-50 achieving the highest translation adequacy and MarianMT showing the strongest character-level fidelity. These findings highlight the importance of task-specific adaptation for underrepresented languages and contribute to ongoing efforts toward inclusive language technologies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification