ImageRAGTurbo: Towards One-step Text-to-Image Generation with Retrieval-Augmented Diffusion Models

Peijie Qiu; Hariharan Ramshankar; Arnau Ramisa; Ren\'e Vidal; Amit Kumar K C; Vamsi Salaka; Rahul Bhagat

arXiv:2602.12640·cs.CV·February 16, 2026

ImageRAGTurbo: Towards One-step Text-to-Image Generation with Retrieval-Augmented Diffusion Models

Peijie Qiu, Hariharan Ramshankar, Arnau Ramisa, Ren\'e Vidal, Amit Kumar K C, Vamsi Salaka, Rahul Bhagat

PDF

Open Access

TL;DR

ImageRAGTurbo introduces a retrieval-augmented diffusion approach that enables one-step text-to-image generation with high fidelity, reducing latency without sacrificing quality by leveraging relevant retrieved examples.

Contribution

The paper proposes a retrieval-augmented finetuning method for diffusion models, enhancing one-step generation quality without extensive retraining.

Findings

01

High-fidelity images generated in one step

02

Retrieval augmentation improves prompt alignment

03

Efficient blending of retrieved content enhances quality

Abstract

Diffusion models have emerged as the leading approach for text-to-image generation. However, their iterative sampling process, which gradually morphs random noise into coherent images, introduces significant latency that limits their applicability. While recent few-step diffusion models reduce the number of sampling steps to as few as one to four steps, they often compromise image quality and prompt alignment, especially in one-step generation. Additionally, these models require computationally expensive training procedures. To address these limitations, we propose ImageRAGTurbo, a novel approach to efficiently finetune few-step diffusion models via retrieval augmentation. Given a text prompt, we retrieve relevant text-image pairs from a database and use them to condition the generation process. We argue that such retrieved examples provide rich contextual information to the UNet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship · Computer Graphics and Visualization Techniques