Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis
Aishwarya Agarwal, Srikrishna Karanam, and Balaji Vasan Srinivasan

TL;DR
This paper introduces a training-free, test-time method for disentangling and controlling color and style attributes in text-to-image diffusion models using reference images, enabling flexible and independent manipulation of these attributes.
Contribution
It proposes the first training-free approach that manipulates color and style in diffusion models at inference time through feature transformations and LAB space disentanglement.
Findings
Effective transfer of color attributes from reference images.
Independent control of style and color attributes.
Flexible fusion of multiple style and color sources.
Abstract
We consider the problem of independently, in a disentangled fashion, controlling the outputs of text-to-image diffusion models with color and style attributes of a user-supplied reference image. We present the first training-free, test-time-only method to disentangle and condition text-to-image models on color and style attributes from reference image. To realize this, we propose two key innovations. Our first contribution is to transform the latent codes at inference time using feature transformations that make the covariance matrix of current generation follow that of the reference image, helping meaningfully transfer color. Next, we observe that there exists a natural disentanglement between color and style in the LAB image space, which we exploit to transform the self-attention feature maps of the image being generated with respect to those of the reference computed from its L…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques · Digital Media Forensic Detection
MethodsDiffusion
