Norm-guided latent space exploration for text-to-image generation
Dvir Samuel, Rami Ben-Ari, Nir Darshan, Haggai Maron, Gal Chechik

TL;DR
This paper introduces a norm-guided method for exploring the latent seed space in text-to-image diffusion models, improving the generation of rare concepts and enhancing performance in few-shot and long-tail tasks.
Contribution
It proposes a novel non-Euclidean metric for seed interpolation that accounts for seed norms, leading to better manipulation and understanding of the latent space.
Findings
Enhanced generation of rare concept images
State-of-the-art performance on few-shot benchmarks
Improved speed, quality, and semantic accuracy
Abstract
Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, the latent space of initial seeds is still not well understood and its structure was shown to impact the generation of various concepts. Specifically, simple operations like interpolation and finding the centroid of a set of seeds perform poorly when using standard Euclidean or spherical metrics in the latent space. This paper makes the observation that, in current training procedures, diffusion models observed inputs with a narrow range of norm values. This has strong implications for methods that rely on seed manipulation for image generation, with applications to few-shot and long-tail learning tasks. To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsDiffusion
