Ensuring Consistency for In-Image Translation

Chengpeng Fu; Xiaocheng Feng; Yichong Huang; Wenshuai Huo; Baohang Li,; Zhirui Zhang; Yunfei Lu; Dandan Tu; Duyu Tang; Hui Wang; Bing Qin; Ting Liu

arXiv:2412.18139·cs.CL·December 25, 2024

Ensuring Consistency for In-Image Translation

Chengpeng Fu, Xiaocheng Feng, Yichong Huang, Wenshuai Huo, Baohang Li,, Zhirui Zhang, Yunfei Lu, Dandan Tu, Duyu Tang, Hui Wang, Bing Qin, Ting Liu

PDF

Open Access

TL;DR

This paper introduces HCIIT, a two-stage framework that enhances in-image translation by ensuring translation and style consistency through multimodal models and diffusion techniques, resulting in more coherent and high-quality translated images.

Contribution

The paper presents a novel two-stage approach combining multimodal language models and diffusion models to maintain consistency in in-image translation tasks, addressing a key gap in existing methods.

Findings

01

Effective in maintaining translation and style consistency.

02

Produces high-quality, style-coherent translated images.

03

Validated on both curated and real-world datasets.

Abstract

The in-image machine translation task involves translating text embedded within images, with the translated results presented in image format. While this task has numerous applications in various scenarios such as film poster translation and everyday scene image translation, existing methods frequently neglect the aspect of consistency throughout this process. We propose the need to uphold two types of consistency in this task: translation consistency and image generation consistency. The former entails incorporating image information during translation, while the latter involves maintaining consistency between the style of the text-image and the original image, ensuring background integrity. To address these consistency requirements, we introduce a novel two-stage framework named HCIIT (High-Consistency In-Image Translation) which involves text-image translation using a multimodal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsSeventeen Ways to Call Uphold Helpline Full Guide USA 24 Hour Assistance · Diffusion