ECNet: Effective Controllable Text-to-Image Diffusion Models

Sicheng Li; Keqiang Sun; Zhixin Lai; Xiaoshi Wu; Feng Qiu; Haoran Xie,; Kazunori Miyata; Hongsheng Li

arXiv:2403.18417·cs.CV·March 28, 2024·2 cites

ECNet: Effective Controllable Text-to-Image Diffusion Models

Sicheng Li, Keqiang Sun, Zhixin Lai, Xiaoshi Wu, Feng Qiu, Haoran Xie,, Kazunori Miyata, Hongsheng Li

PDF

Open Access

TL;DR

ECNet introduces innovative guidance and supervision techniques to significantly improve the controllability and robustness of text-to-image diffusion models, enabling more precise and reliable image generation from complex conditions.

Contribution

The paper presents Spatial Guidance Injector and Diffusion Consistency Loss, novel methods that enhance control accuracy and supervision in diffusion-based text-to-image models.

Findings

01

Enhanced controllability over various conditions

02

Outperforms existing state-of-the-art models

03

Improved robustness and precision in image generation

Abstract

The conditional text-to-image diffusion models have garnered significant attention in recent years. However, the precision of these models is often compromised mainly for two reasons, ambiguous condition input and inadequate condition guidance over single denoising loss. To address the challenges, we introduce two innovative solutions. Firstly, we propose a Spatial Guidance Injector (SGI) which enhances conditional detail by encoding text inputs with precise annotation information. This method directly tackles the issue of ambiguous control inputs by providing clear, annotated guidance to the model. Secondly, to overcome the issue of limited conditional supervision, we introduce Diffusion Consistency Loss (DCL), which applies supervision on the denoised latent code at any given time step. This encourages consistency between the latent code at each time step and the input signal, thereby…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Music and Audio Processing

MethodsDiffusion