An Image is Worth Multiple Words: Multi-attribute Inversion for   Constrained Text-to-Image Synthesis

Aishwarya Agarwal; Srikrishna Karanam; Tripti Shukla; Balaji Vasan; Srinivasan

arXiv:2311.11919·cs.CV·November 21, 2023·2 cites

An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis

Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan, Srinivasan

PDF

Open Access

TL;DR

This paper introduces MATTE, a multi-attribute inversion method for diffusion models that disentangles attributes like color, style, layout, and object from a single reference image, improving constrained text-to-image synthesis.

Contribution

The paper provides an extensive analysis of attribute capture in diffusion models and proposes MATTE, a novel inversion algorithm that disentangles multiple attributes across different model dimensions.

Findings

01

MATTE effectively disentangles multiple attributes in diffusion models.

02

Attributes like color and style are captured in the same layers, while layout and color are captured across timesteps.

03

The method improves constrained image synthesis by leveraging disentangled attribute representations.

Abstract

We consider the problem of constraining diffusion model outputs with a user-supplied reference image. Our key objective is to extract multiple attributes (e.g., color, object, layout, style) from this single reference image, and then generate new samples with them. One line of existing work proposes to invert the reference images into a single textual conditioning vector, enabling generation of new samples with this learned token. These methods, however, do not learn multiple tokens that are necessary to condition model outputs on the multiple attributes noted above. Another line of techniques expand the inversion space to learn multiple embeddings but they do this only along the layer dimension (e.g., one per layer of the DDPM model) or the timestep dimension (one for a set of timesteps in the denoising process), leading to suboptimal attribute disentanglement. To address the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques · Computer Graphics and Visualization Techniques

MethodsSparse Evolutionary Training · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Concatenated Skip Connection · Max Pooling · U-Net · Diffusion