Addressing degeneracies in latent interpolation for diffusion models
Erik Landolsi, Fredrik Kahl

TL;DR
This paper investigates the problem of degeneracies in latent space interpolation for diffusion models, providing a theoretical analysis, identifying causes, and proposing a normalization-based remedy to improve image quality.
Contribution
It introduces a simple normalization scheme to mitigate degeneracies in latent interpolation for diffusion models, enhancing image quality and robustness.
Findings
Normalization reduces degeneration effects in latent interpolation
Improved FID and CLIP scores with the proposed method
Baseline methods show quality drop before visible degeneration
Abstract
There is an increasing interest in using image-generating diffusion models for deep data augmentation and image morphing. In this context, it is useful to interpolate between latents produced by inverting a set of input images, in order to generate new images representing some mixture of the inputs. We observe that such interpolation can easily lead to degenerate results when the number of inputs is large. We analyze the cause of this effect theoretically and experimentally, and suggest a suitable remedy. The suggested approach is a relatively simple normalization scheme that is easy to use whenever interpolation between latents is needed. We measure image quality using FID and CLIP embedding distance and show experimentally that baseline interpolation methods lead to a drop in quality metrics long before the degeneration issue is clearly visible. In contrast, our method significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Medical Image Segmentation Techniques
MethodsSparse Evolutionary Training · Diffusion · Contrastive Language-Image Pre-training
