Boosting Text-To-Image Generation via Multilingual Prompting in Large   Multimodal Models

Yongyu Mu; Hengyu Li; Junxin Wang; Xiaoxuan Zhou; Chenglong Wang,; Yingfeng Luo; Qiaozhi He; Tong Xiao; Guocheng Chen; Jingbo Zhu

arXiv:2501.07086·cs.CL·January 14, 2025

Boosting Text-To-Image Generation via Multilingual Prompting in Large Multimodal Models

Yongyu Mu, Hengyu Li, Junxin Wang, Xiaoxuan Zhou, Chenglong Wang,, Yingfeng Luo, Qiaozhi He, Tong Xiao, Guocheng Chen, Jingbo Zhu

PDF

1 Repo

TL;DR

This paper introduces PMT2I, a multilingual prompting method that leverages the multilingual capabilities of large multimodal models to improve text-to-image generation, especially for complex and detailed descriptions.

Contribution

The paper proposes a novel multilingual prompting approach that enhances text comprehension in large multimodal models for improved image generation quality.

Findings

01

PMT2I outperforms baseline prompts in general, compositional, and fine-grained assessments.

02

The method achieves higher human preference alignment.

03

PMT2I generates more diverse images and improves reranking performance.

Abstract

Previous work on augmenting large multimodal models (LMMs) for text-to-image (T2I) generation has focused on enriching the input space of in-context learning (ICL). This includes providing a few demonstrations and optimizing image descriptions to be more detailed and logical. However, as demand for more complex and flexible image descriptions grows, enhancing comprehension of input text within the ICL paradigm remains a critical yet underexplored area. In this work, we extend this line of research by constructing parallel multilingual prompts aimed at harnessing the multilingual capabilities of LMMs. More specifically, we translate the input text into several languages and provide the models with both the original text and the translations. Experiments on two LMMs across 3 benchmarks show that our method, PMT2I, achieves superior performance in general, compositional, and fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

takagi97/pmt2i
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.