Towards Visual Text Design Transfer Across Languages
Yejin Choi, Jiwan Chung, Sumin Shim, Giyeong Oh, Youngjae Yu

TL;DR
This paper introduces MuST-Bench, a benchmark for evaluating visual text translation across languages, and proposes SIGIL, a new framework that improves multilingual visual text generation without relying on style descriptions.
Contribution
The paper presents MuST-Bench for assessing multimodal style translation and introduces SIGIL, a novel multimodal style translation framework that enhances multilingual visual text generation.
Findings
SIGIL outperforms baselines in style consistency and legibility
Existing models struggle with cross-language visual text translation
MuST-Bench provides a new standard for evaluation in this domain
Abstract
Visual text design plays a critical role in conveying themes, emotions, and atmospheres in multimodal formats such as film posters and album covers. Translating these visual and textual elements across languages extends the concept of translation beyond mere text, requiring the adaptation of aesthetic and stylistic features. To address this, we introduce a novel task of Multimodal Style Translation (MuST-Bench), a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems while preserving design intent. Our initial experiments on MuST-Bench reveal that existing visual text generation models struggle with the proposed task due to the inadequacy of textual descriptions in conveying visual design. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDigital Media and Visual Art
