LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti

Tabia Tanzin Prama; Christopher M. Danforth; Peter Sheridan Dodds

arXiv:2511.21761·cs.CL·December 1, 2025

LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti

Tabia Tanzin Prama, Christopher M. Danforth, Peter Sheridan Dodds

PDF

Open Access

TL;DR

This paper explores the use of large language models for translating the low-resource Sylheti dialect, introducing a context-aware prompting framework that significantly improves translation quality and reduces errors.

Contribution

It introduces Sylheti-CAP, a novel three-step prompt framework embedding linguistic rules and vocabulary to enhance LLM-based dialect translation.

Findings

01

Sylheti-CAP improves translation accuracy across models

02

Reduces hallucinations and ambiguities in dialect translation

03

Effective for low-resource and dialectal language translation

Abstract

Large Language Models (LLMs) have demonstrated strong translation abilities through prompting, even without task-specific training. However, their effectiveness in dialectal and low-resource contexts remains underexplored. This study presents the first systematic investigation of LLM-based machine translation (MT) for Sylheti, a dialect of Bangla that is itself low-resource. We evaluate five advanced LLMs (GPT-4.1, GPT-4.1, LLaMA 4, Grok 3, and DeepSeek V3.2) across both translation directions (Bangla $\Leftrightarrow$ Sylheti), and find that these models struggle with dialect-specific vocabulary. To address this, we introduce Sylheti-CAP (Context-Aware Prompting), a three-step framework that embeds a linguistic rulebook, a dictionary (2{,}260 core vocabulary items and idioms), and an authenticity check directly into prompts. Extensive experiments show that Sylheti-CAP consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution