Fast Data Attribution for Text-to-Image Models

Sheng-Yu Wang; Aaron Hertzmann; Alexei A Efros; Richard Zhang; Jun-Yan Zhu

arXiv:2511.10721·cs.CV·November 17, 2025

Fast Data Attribution for Text-to-Image Models

Sheng-Yu Wang, Aaron Hertzmann, Alexei A Efros, Richard Zhang, Jun-Yan Zhu

PDF

Open Access

TL;DR

This paper introduces a scalable, efficient data attribution method for text-to-image models that significantly reduces computation time by using a distilled embedding space, enabling rapid identification of influential training images.

Contribution

We propose a novel distillation-based approach that enables fast, scalable data attribution for large text-to-image models, outperforming existing methods in speed and efficiency.

Findings

01

Achieves 2,500x to 400,000x faster attribution than prior methods.

02

Effective on both medium-scale and large-scale models trained on MSCOCO and LAION.

03

Demonstrates practical applicability for real-world models like Stable Diffusion.

Abstract

Data attribution for text-to-image models aims to identify the training images that most significantly influenced a generated output. Existing attribution methods involve considerable computational resources for each query, making them impractical for real-world applications. We propose a novel approach for scalable and efficient data attribution. Our key idea is to distill a slow, unlearning-based attribution method to a feature embedding space for efficient retrieval of highly influential training images. During deployment, combined with efficient indexing and search methods, our method successfully finds highly influential images without running expensive attribution algorithms. We show extensive results on both medium-scale models trained on MSCOCO and large-scale Stable Diffusion models trained on LAION, demonstrating that our method can achieve better or competitive performance in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis