Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion

Samuele Dell'Erba; Andrew D. Bagdanov

arXiv:2511.20821·cs.CV·December 30, 2025

Training-Free Diffusion Priors for Text-to-Image Generation via Optimization-based Visual Inversion

Samuele Dell'Erba, Andrew D. Bagdanov

PDF

Open Access

TL;DR

This paper introduces a training-free, optimization-based method called OVI to replace diffusion priors in text-to-image generation, improving visual quality without extensive training.

Contribution

Proposes a novel zero-shot visual inversion technique with regularization constraints as an alternative to trained diffusion priors.

Findings

01

OVI can replace traditional diffusion priors effectively.

02

Regularization constraints improve the realism of generated images.

03

Benchmark scores are comparable or superior to state-of-the-art methods.

Abstract

Diffusion models have established the state-of-the-art in text-to-image generation, but their performance often relies on a diffusion prior network to translate text embeddings into the visual manifold for easier decoding. These priors are computationally expensive and require extensive training on massive datasets. In this work, we challenge the necessity of a trained prior at all by employing Optimization-based Visual Inversion (OVI), a training-free and zero-shot alternative, to replace the need for a prior. OVI initializes a latent visual representation from random pseudo-tokens and iteratively optimizes it to maximize the cosine similarity with the input textual prompt embedding. We further propose two novel constraints, a Mahalanobis-based and a Nearest-Neighbor loss, to regularize the OVI optimization process toward the distribution of realistic images. Our experiments, conducted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Enhancement Techniques