AdiBhashaa: A Community-Curated Benchmark for Machine Translation into Indian Tribal Languages
Pooja Singh, Sandeep Kumar

TL;DR
AdiBhashaa introduces a community-driven approach to create the first open parallel corpora and baseline machine translation systems for four Indian tribal languages, aiming to improve digital inclusion and equitable AI development.
Contribution
It presents a novel participatory data collection method and baseline MT models for tribal languages, emphasizing community involvement and capacity building.
Findings
Developed open parallel corpora for four tribal languages.
Baseline MT systems demonstrate initial translation capabilities.
Highlights the importance of community engagement in language technology.
Abstract
Large language models and multilingual machine translation (MT) systems increasingly drive access to information, yet many languages of the tribal communities remain effectively invisible in these technologies. This invisibility exacerbates existing structural inequities in education, governance, and digital participation. We present AdiBhashaa, a community-driven initiative that constructs the first open parallel corpora and baseline MT systems for four major Indian tribal languages-Bhili, Mundari, Gondi, and Santali. This work combines participatory data creation with native speakers, human-in-the-loop validation, and systematic evaluation of both encoder-decoder MT models and large language models. In addition to reporting technical findings, we articulate how AdiBhashaa illustrates a possible model for more equitable AI research: it centers local expertise, builds capacity among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT in Developing Communities · Natural Language Processing Techniques · Language and cultural evolution
