VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
Shaojin Wu, Fei Ding, Mengqi Huang, Wei Liu, Qian He

TL;DR
VMix introduces a plug-and-play aesthetic control adapter for diffusion models, enhancing image quality by disentangling content and aesthetic prompts and integrating aesthetic conditions through cross-attention value mixing, without retraining.
Contribution
The paper proposes VMix, a novel aesthetic adapter that improves image aesthetics in diffusion models via cross-attention value mixing, maintaining generality and compatibility with existing models.
Findings
VMix outperforms state-of-the-art methods in aesthetic image generation.
VMix is compatible with community modules like LoRA, ControlNet, and IPAdapter.
The method enhances aesthetic quality without retraining existing models.
Abstract
While diffusion models show extraordinary talents in text-to-image generation, they may still fail to generate highly aesthetic images. More specifically, there is still a gap between the generated images and the real-world aesthetic images in finer-grained dimensions including color, lighting, composition, etc. In this paper, we propose Cross-Attention Value Mixing Control (VMix) Adapter, a plug-and-play aesthetics adapter, to upgrade the quality of generated images while maintaining generality across visual concepts by (1) disentangling the input text prompt into the content description and aesthetic description by the initialization of aesthetic embedding, and (2) integrating aesthetic conditions into the denoising process through value-mixed cross-attention, with the network connected by zero-initialized linear layers. Our key insight is to enhance the aesthetic presentation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsDiffusion · Adapter
