Norm-guided latent space exploration for text-to-image generation

Dvir Samuel; Rami Ben-Ari; Nir Darshan; Haggai Maron; Gal Chechik

arXiv:2306.08687·cs.CV·November 7, 2023·2 cites

Norm-guided latent space exploration for text-to-image generation

Dvir Samuel, Rami Ben-Ari, Nir Darshan, Haggai Maron, Gal Chechik

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a norm-guided method for exploring the latent seed space in text-to-image diffusion models, improving the generation of rare concepts and enhancing performance in few-shot and long-tail tasks.

Contribution

It proposes a novel non-Euclidean metric for seed interpolation that accounts for seed norms, leading to better manipulation and understanding of the latent space.

Findings

01

Enhanced generation of rare concept images

02

State-of-the-art performance on few-shot benchmarks

03

Improved speed, quality, and semantic accuracy

Abstract

Text-to-image diffusion models show great potential in synthesizing a large variety of concepts in new compositions and scenarios. However, the latent space of initial seeds is still not well understood and its structure was shown to impact the generation of various concepts. Specifically, simple operations like interpolation and finding the centroid of a set of seeds perform poorly when using standard Euclidean or spherical metrics in the latent space. This paper makes the observation that, in current training procedures, diffusion models observed inputs with a narrow range of norm values. This has strong implications for methods that rely on seed manipulation for image generation, with applications to few-shot and long-tail learning tasks. To address this issue, we propose a novel method for interpolating between two seeds and demonstrate that it defines a new non-Euclidean metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dvirsamuel/SeedSelect
pytorchOfficial

Videos

Norm-guided latent space exploration for text-to-image generation· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsDiffusion