Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image   Variations

Manoj Kumar; Neil Houlsby; Emiel Hoogeboom

arXiv:2405.14857·cs.CV·October 3, 2024

Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image Variations

Manoj Kumar, Neil Houlsby, Emiel Hoogeboom

PDF

Open Access

TL;DR

This paper introduces Semantica, a diffusion model trained on web-scale image pairs for diverse and contextually consistent image variations, highlighting a new pretraining strategy and evaluation metrics.

Contribution

The paper presents a novel pretraining approach for diffusion models using web-scale image pairs to generate diverse image variations with semantic consistency.

Findings

01

Semantica can generate diverse variations from dataset images.

02

The model's performance depends on the choice of image encoder.

03

Proposed metrics better evaluate image variations than standard metrics.

Abstract

Generating image variations, where a model produces variations of an input image while preserving the semantic context has gained increasing attention. Current image variation techniques involve adapting a text-to-image model to reconstruct an input image conditioned on the same image. We first demonstrate that a diffusion model trained to reconstruct an input image from frozen embeddings, can reconstruct the image with minor variations. Second, inspired by how text-to-image models learn from web-scale text-image pairs, we explore a new pretraining strategy to generate image variations using a large collection of image pairs. Our diffusion model \textit{Semantica} receives a random (encoded) image from a webpage as conditional input and denoises another noisy random image from the same webpage. We carefully examine various design choices for the image encoder, given its crucial role in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management

MethodsDiffusion