Show Me the World in My Language: Establishing the First Baseline for   Scene-Text to Scene-Text Translation

Shreyas Vaidya; Arvind Kumar Sharma; Prajwal Gatti; Anand Mishra

arXiv:2308.03024·cs.CV·September 4, 2024

Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation

Shreyas Vaidya, Arvind Kumar Sharma, Prajwal Gatti, Anand Mishra

PDF

Open Access 1 Repo

TL;DR

This paper introduces the first study on visually translating scene text between languages, combining recognition, translation, and image synthesis to preserve visual features, and provides baseline models and evaluation metrics for this novel task.

Contribution

The paper pioneers the study of visual scene-text translation, proposing a cascaded framework, task-specific enhancements, and evaluation metrics for the first time.

Findings

01

Effective translation over large scene text datasets

02

Baseline models partially address visual translation challenges

03

New evaluation metrics for visual translation quality

Abstract

In this work, we study the task of ``visually'' translating scene text from a source language (e.g., Hindi) to a target language (e.g., English). Visual translation involves not just the recognition and translation of scene text but also the generation of the translated image that preserves visual features of the source scene text, such as font, size, and background. There are several challenges associated with this task, such as translation with limited context, deciding between translation and transliteration, accommodating varying text lengths within fixed spatial boundaries, and preserving the font and background styles of the source scene text in the target language. To address this problem, we make the following contributions: (i) We study visual translation as a standalone problem for the first time in the literature. (ii) We present a cascaded framework for visual translation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Bhashini-IITJ/visualTranslation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications