AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Jingyuan Qi, Zhiyang Xu, Qifan Wang, Lifu Huang

TL;DR
AR-RAG introduces a dynamic, context-aware retrieval augmentation method for image generation that iteratively incorporates relevant patches at each step, improving quality and diversity over static retrieval approaches.
Contribution
It presents a novel autoregressive retrieval augmentation paradigm with two frameworks, DAiD and FAiD, for improved image generation by integrating patch-level retrieval during each generation step.
Findings
Significant performance improvements on benchmarks like Midjourney-30K, GenEval, and DPG-Bench.
Effective avoidance of over-copying and stylistic bias in generated images.
Demonstrated versatility of AR-RAG across multiple image generation tasks.
Abstract
We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level. Unlike prior methods that perform a single, static retrieval before generation and condition the entire generation on fixed reference images, AR-RAG performs context-aware retrievals at each generation step, using prior-generated patches as queries to retrieve and incorporate the most relevant patch-level visual references, enabling the model to respond to evolving generation needs while avoiding limitations (e.g., over-copying, stylistic bias, etc.) prevalent in existing methods. To realize AR-RAG, we propose two parallel frameworks: (1) Distribution-Augmentation in Decoding (DAiD), a training-free plug-and-use decoding strategy that directly merges the distribution of model-predicted patches with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques
MethodsConvolution
