Enhancing Image Generation Fidelity via Progressive Prompts

Zhen Xiong; Yuqi Li; Chuanguang Yang; Tiao Tan; Zhihong Zhu; Siyuan; Li; Yue Ma

arXiv:2501.07070·cs.CV·January 14, 2025

Enhancing Image Generation Fidelity via Progressive Prompts

Zhen Xiong, Yuqi Li, Chuanguang Yang, Tiao Tan, Zhihong Zhu, Siyuan, Li, Yue Ma

PDF

1 Repo

TL;DR

This paper introduces a coarse-to-fine regional prompt control pipeline for DiT-based image generation, leveraging LLMs to improve controllability and image quality through layered cross-attention manipulation.

Contribution

It proposes a novel regional prompt injection method into DiT models, utilizing LLMs for detailed content and style descriptions, enhancing image generation controllability.

Findings

01

Improved image fidelity and diversity demonstrated.

02

Layer-specific prompt control enhances regional content accuracy.

03

Quantitative and qualitative results show performance gains.

Abstract

The diffusion transformer (DiT) architecture has attracted significant attention in image generation, achieving better fidelity, performance, and diversity. However, most existing DiT - based image generation methods focus on global - aware synthesis, and regional prompt control has been less explored. In this paper, we propose a coarse - to - fine generation pipeline for regional prompt - following generation. Specifically, we first utilize the powerful large language model (LLM) to generate both high - level descriptions of the image (such as content, topic, and objects) and low - level descriptions (such as details and style). Then, we explore the influence of cross - attention layers at different depths. We find that deeper layers are always responsible for high - level content control, while shallow layers handle low - level content control. Various prompts are injected into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenxiong-dl/icassp2025-rcac
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Attentive Walk-Aggregating Graph Neural Network · Diffusion · Focus