VMix: Improving Text-to-Image Diffusion Model with Cross-Attention   Mixing Control

Shaojin Wu; Fei Ding; Mengqi Huang; Wei Liu; Qian He

arXiv:2412.20800·cs.CV·December 31, 2024

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Shaojin Wu, Fei Ding, Mengqi Huang, Wei Liu, Qian He

PDF

Open Access 1 Repo

TL;DR

VMix introduces a plug-and-play aesthetic control adapter for diffusion models, enhancing image quality by disentangling content and aesthetic prompts and integrating aesthetic conditions through cross-attention value mixing, without retraining.

Contribution

The paper proposes VMix, a novel aesthetic adapter that improves image aesthetics in diffusion models via cross-attention value mixing, maintaining generality and compatibility with existing models.

Findings

01

VMix outperforms state-of-the-art methods in aesthetic image generation.

02

VMix is compatible with community modules like LoRA, ControlNet, and IPAdapter.

03

The method enhances aesthetic quality without retraining existing models.

Abstract

While diffusion models show extraordinary talents in text-to-image generation, they may still fail to generate highly aesthetic images. More specifically, there is still a gap between the generated images and the real-world aesthetic images in finer-grained dimensions including color, lighting, composition, etc. In this paper, we propose Cross-Attention Value Mixing Control (VMix) Adapter, a plug-and-play aesthetics adapter, to upgrade the quality of generated images while maintaining generality across visual concepts by (1) disentangling the input text prompt into the content description and aesthetic description by the initialization of aesthetic embedding, and (2) integrating aesthetic conditions into the denoising process through value-mixed cross-attention, with the network connected by zero-initialized linear layers. Our key insight is to enhance the aesthetic presentation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fenfenfenfan/VMix
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion · Adapter