Text Image Generation for Low-Resource Languages with Dual Translation   Learning

Chihiro Noguchi; Shun Fukuda; Shoichiro Mihara; Masao Yamanaka

arXiv:2409.17747·cs.CV·September 27, 2024

Text Image Generation for Low-Resource Languages with Dual Translation Learning

Chihiro Noguchi, Shun Fukuda, Shoichiro Mihara, Masao Yamanaka

PDF

Open Access

TL;DR

This paper introduces a diffusion-based dual translation model that generates realistic text images for low-resource languages, improving scene text recognition by emulating high-resource language styles and enhancing training datasets.

Contribution

The study presents a novel diffusion model with dual translation tasks and guidance techniques to generate diverse, realistic text images for low-resource languages, boosting recognition performance.

Findings

01

Generated images significantly improve recognition accuracy.

02

Dual translation effectively differentiates synthetic and real images.

03

Guidance techniques enhance image quality and diversity.

Abstract

Scene text recognition in low-resource languages frequently faces challenges due to the limited availability of training datasets derived from real-world scenes. This study proposes a novel approach that generates text images in low-resource languages by emulating the style of real text images from high-resource languages. Our approach utilizes a diffusion model that is conditioned on binary states: ``synthetic'' and ``real.'' The training of this model involves dual translation tasks, where it transforms plain text images into either synthetic or real text images, based on the binary states. This approach not only effectively differentiates between the two domains but also facilitates the model's explicit recognition of characters in the target language. Furthermore, to enhance the accuracy and variety of generated text images, we introduce two guidance techniques: Fidelity-Diversity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsDiffusion