Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation
Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, Min Zhang, Jinsong Su

TL;DR
Translatotron-V(ision) is an end-to-end in-image machine translation model that effectively translates images containing text into translated images, reducing parameters and outperforming previous pixel-level models.
Contribution
The paper introduces Translatotron-V(ision), a novel end-to-end IIMT model with a new architecture and training framework that improves translation quality and efficiency.
Findings
Achieves competitive performance with 70.9% of parameters of cascaded models.
Outperforms existing pixel-level end-to-end IIMT models.
Introduces Structure-BLEU, a new evaluation metric for image translation quality.
Abstract
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language. In this regard, conventional cascaded methods suffer from issues such as error propagation, massive parameters, and difficulties in deployment and retaining visual characteristics of the input image. Thus, constructing end-to-end models has become an option, which, however, faces two main challenges: 1) the huge modeling burden, as it is required to simultaneously learn alignment across languages and preserve the visual characteristics of the input image; 2) the difficulties of directly predicting excessively lengthy pixel sequences. In this paper, we propose \textit{Translatotron-V(ision)}, an end-to-end IIMT model consisting of four modules. In addition to an image encoder, and an image decoder, our model contains a target text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
