Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation
Shreyas Vaidya, Arvind Kumar Sharma, Prajwal Gatti, Anand Mishra

TL;DR
This paper introduces the first study on visually translating scene text between languages, combining recognition, translation, and image synthesis to preserve visual features, and provides baseline models and evaluation metrics for this novel task.
Contribution
The paper pioneers the study of visual scene-text translation, proposing a cascaded framework, task-specific enhancements, and evaluation metrics for the first time.
Findings
Effective translation over large scene text datasets
Baseline models partially address visual translation challenges
New evaluation metrics for visual translation quality
Abstract
In this work, we study the task of ``visually'' translating scene text from a source language (e.g., Hindi) to a target language (e.g., English). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the source scene text, such as font, size, and background. There are several challenges associated with this task, such as translation with limited context, deciding between translation and transliteration, accommodating varying text lengths within fixed spatial boundaries, and preserving the font and background styles of the source scene text in the target language. To address this problem, we make the following contributions: (i) We study visual translation as a standalone problem for the first time in the literature. (ii) We present a cascaded framework for visual translation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications
