Towards Visually-Guided Movie Subtitle Translation for Indic Languages

Tarun Chintada; Kshetrimayum Boynao Singh; Asif Ekbal

arXiv:2605.11993·cs.CL·May 13, 2026

Towards Visually-Guided Movie Subtitle Translation for Indic Languages

Tarun Chintada, Kshetrimayum Boynao Singh, Asif Ekbal

PDF

TL;DR

This paper explores multimodal movie subtitle translation for Indic languages, demonstrating that selective visual grounding improves translation quality by capturing scene context and emotion, especially in long videos.

Contribution

It introduces a lightweight visual grounding strategy using attribute summaries and shows that selective grounding enhances translation without extensive visual processing.

Findings

01

Oracle selective grounding improves translation quality.

02

Attribute-based summaries effectively capture scene context.

03

Temporal misalignment challenges are significant in long videos.

Abstract

Movie subtitle translation is inherently multimodal, yet text-only systems often miss visual cues needed to convey emotion, action, and social nuance, especially for low-resource Indic languages (English to Hindi, Bengali, Telugu, Tamil and Kannada). We present a case study on five full-length films and compare two lightweight visual grounding strategies: structured attribute summaries from a 5-minute sliding window and free-text summaries of inter-subtitle visual gaps. Our analysis shows that temporal misalignment between subtitles and frames is a major obstacle in long-form video, often rendering indiscriminate visual grounding ineffective. However, oracle selective grounding, which replaces only the lowest-quality 20-30\% of baseline segments with visual-enhanced outputs, consistently improves COMET over the text-only baseline while requiring far less visual processing. Among the two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.