LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti
Tabia Tanzin Prama, Christopher M. Danforth, Peter Sheridan Dodds

TL;DR
This paper explores the use of large language models for translating the low-resource Sylheti dialect, introducing a context-aware prompting framework that significantly improves translation quality and reduces errors.
Contribution
It introduces Sylheti-CAP, a novel three-step prompt framework embedding linguistic rules and vocabulary to enhance LLM-based dialect translation.
Findings
Sylheti-CAP improves translation accuracy across models
Reduces hallucinations and ambiguities in dialect translation
Effective for low-resource and dialectal language translation
Abstract
Large Language Models (LLMs) have demonstrated strong translation abilities through prompting, even without task-specific training. However, their effectiveness in dialectal and low-resource contexts remains underexplored. This study presents the first systematic investigation of LLM-based machine translation (MT) for Sylheti, a dialect of Bangla that is itself low-resource. We evaluate five advanced LLMs (GPT-4.1, GPT-4.1, LLaMA 4, Grok 3, and DeepSeek V3.2) across both translation directions (Bangla Sylheti), and find that these models struggle with dialect-specific vocabulary. To address this, we introduce Sylheti-CAP (Context-Aware Prompting), a three-step framework that embeds a linguistic rulebook, a dictionary (2{,}260 core vocabulary items and idioms), and an authenticity check directly into prompts. Extensive experiments show that Sylheti-CAP consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
