RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance
Jiaojiao Fan, Haotian Xue, Qinsheng Zhang, Yongxin Chen

TL;DR
RefDrop introduces a simple, controllable method for enhancing consistency in image and video generation by manipulating attention modules, eliminating the need for complex fine-tuning or additional encoders.
Contribution
The paper reveals that reference feature guidance operates as a linear interpolation of attention modules and proposes RefDrop, a rank-1 coefficient-free algorithm for precise control in diffusion models.
Findings
RefDrop achieves high consistency in multi-subject image generation.
It enables diverse content suppression and personalized video generation.
Outperforms state-of-the-art methods in controllability and quality.
Abstract
There is a rapidly growing interest in controlling consistency across multiple generated images using diffusion models. Among various methods, recent works have found that simply manipulating attention modules by concatenating features from multiple reference images provides an efficient approach to enhancing consistency without fine-tuning. Despite its popularity and success, few studies have elucidated the underlying mechanisms that contribute to its effectiveness. In this work, we reveal that the popular approach is a linear interpolation of image self-attention and cross-attention between synthesized content and reference features, with a constant rank-1 coefficient. Motivated by this observation, we find that a rank-1 coefficient is not necessary and simplifies the controllable generation mechanism. The resulting algorithm, which we coin as RefDrop, allows users to control the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsDiffusion
