Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language

Prathamesh Devadiga; Paras Chopra

arXiv:2602.15378·cs.CL·February 18, 2026

Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language

Prathamesh Devadiga, Paras Chopra

PDF

Open Access 1 Video

TL;DR

This study explores how large language models can converse in Tulu, a low-resource language, using structured prompts without fine-tuning, achieving high grammatical accuracy and minimal vocabulary contamination.

Contribution

The paper demonstrates that structured prompting techniques enable LLMs to effectively converse in an extremely low-resource language like Tulu without additional training.

Findings

01

Vocabulary contamination reduced from 80% to 5%.

02

Achieved 85% grammatical accuracy in Tulu.

03

Negative constraints improve performance across models.

Abstract

Can large language models converse in languages virtually absent from their training data? We investigate this question through a case study on Tulu, a Dravidian language with over 2 million speakers but minimal digital presence. Rather than fine-tuning an LLM, we examine whether structured prompts alone can elicit basic conversational ability under controlled prompting. We systematically tackle various challenges posed by absence of training data for Tulu by combining explicit grammar documentation, negative constraints to suppress high-probability tokens from related languages, romanization standardization, and quality-controlled synthetic data generation via self-play. Evaluated on a manually curated held-out set across three LLMs (Gemini 2.0 Flash, GPT-4o, Llama 3.1 70B) and validated by native speakers, our approach reduces vocabulary contamination from 80% to 5% while achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Making Large Language Models Speak Tulu: Structured Prompting for an Extremely Low-Resource Language· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution