Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Shaharukh Khan; Ayush Tarun; Ali Faraz; Palash Kamble; Vivek Dahiya,; Praveen Pokala; Ashish Kulkarni; Chandra Khatri; Abhinav Ravi; Shubham; Agarwal

arXiv:2502.20420·cs.CL·March 3, 2025

Chitranuvad: Adapting Multi-Lingual LLMs for Multimodal Translation

Shaharukh Khan, Ayush Tarun, Ali Faraz, Palash Kamble, Vivek Dahiya,, Praveen Pokala, Ashish Kulkarni, Chandra Khatri, Abhinav Ravi, Shubham, Agarwal

PDF

1 Models

TL;DR

Chitranuvad is a multimodal translation system that combines multilingual large language models with vision modules to improve translation quality for Indic languages, achieving state-of-the-art results in Hindi.

Contribution

The paper introduces a novel multimodal translation model integrating ViT and LLMs with an adapter layer for improved multilingual translation performance.

Findings

01

Achieved SOTA results for Hindi translation tasks.

02

Performed competitively on all three translation tracks.

03

Effectively integrated visual and textual data for translation.

Abstract

In this work, we provide the system description of our submission as part of the English to Lowres Multimodal Translation Task at the Workshop on Asian Translation (WAT2024). We introduce Chitranuvad, a multimodal model that effectively integrates Multilingual LLM and a vision module for Multimodal Translation. Our method uses a ViT image encoder to extract visual representations as visual token embeddings which are projected to the LLM space by an adapter layer and generates translation in an autoregressive fashion. We participated in all the three tracks (Image Captioning, Text only and Multimodal translation tasks) for Indic languages (ie. English translation to Hindi, Bengali and Malyalam) and achieved SOTA results for Hindi in all of them on the Challenge set while remaining competitive for the other languages in the shared task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
krutrim-ai-labs/Chitranuvad
model· 63 dl· ♡ 1
63 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.