Chitrarth: Bridging Vision and Language for a Billion People

Shaharukh Khan; Ayush Tarun; Abhinav Ravi; Ali Faraz; Akshat Patidar,; Praveen Kumar Pokala; Anagha Bhangare; Raja Kolla; Chandra Khatri; Shubham; Agarwal

arXiv:2502.15392·cs.AI·February 24, 2025

Chitrarth: Bridging Vision and Language for a Billion People

Shaharukh Khan, Ayush Tarun, Abhinav Ravi, Ali Faraz, Akshat Patidar,, Praveen Kumar Pokala, Anagha Bhangare, Raja Kolla, Chandra Khatri, Shubham, Agarwal

PDF

1 Models 3 Datasets

TL;DR

Chitrarth is a multilingual vision-language model designed for 10 Indian languages, achieving state-of-the-art results on benchmarks and fostering inclusive AI for diverse linguistic communities.

Contribution

The paper introduces Chitrarth, a novel multilingual vision-language model tailored for Indian languages, and BharatBench, a new evaluation framework for such models.

Findings

01

Achieves SOTA results on low-resource Indian languages

02

Maintains efficiency in English language tasks

03

Provides a comprehensive benchmark for Indian language VLMs

Abstract

Recent multimodal foundation models are primarily trained on English or high resource European language data, which hinders their applicability to other medium and low-resource languages. To address this limitation, we introduce Chitrarth (Chitra: Image; Artha: Meaning), an inclusive Vision-Language Model (VLM), specifically targeting the rich linguistic diversity and visual reasoning across 10 prominent Indian languages. Our model effectively integrates a state-of-the-art (SOTA) multilingual Large Language Model (LLM) with a vision module, primarily trained on multilingual image-text data. Furthermore, we also introduce BharatBench, a comprehensive framework for evaluating VLMs across various Indian languages, ultimately contributing to more diverse and effective AI systems. Our model achieves SOTA results for benchmarks across low resource languages while retaining its efficiency in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
krutrim-ai-labs/Chitrarth
model· 140 dl· ♡ 18
140 dl♡ 18

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training