DMin: Scalable Training Data Influence Estimation for Diffusion Models

Huawei Lin; Yingjie Lao; Weijie Zhao

arXiv:2412.08637·cs.CV·April 10, 2026

DMin: Scalable Training Data Influence Estimation for Diffusion Models

Huawei Lin, Yingjie Lao, Weijie Zhao

PDF

1 Repo 1 Models 2 Datasets

TL;DR

DMin is a scalable influence estimation framework for diffusion models that efficiently identifies influential training data samples for large-scale models, reducing storage and computation costs.

Contribution

It introduces the first scalable influence estimation method for billion-parameter diffusion models, using gradient compression for efficiency.

Findings

01

DMin accurately identifies influential training samples.

02

DMin retrieves top-k influential samples in under 1 second.

03

DMin significantly reduces storage from hundreds of TBs to MBs or KBs.

Abstract

Identifying the training data samples that most influence a generated image is a critical task in understanding diffusion models (DMs), yet existing influence estimation methods are constrained to small-scale or LoRA-tuned models due to computational limitations. To address this challenge, we propose DMin (Diffusion Model influence), a scalable framework for estimating the influence of each training data sample on a given generated image. To the best of our knowledge, it is the first method capable of influence estimation for DMs with billions of parameters. Leveraging efficient gradient compression, DMin reduces storage requirements from hundreds of TBs to mere MBs or even KBs, and retrieves the top-k most influential training samples in under 1 second, all while maintaining performance. Our empirical results demonstrate DMin is both effective in identifying influential training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huawei-lin/DMin
github

Models

🤗
huaweilin/DMin_sd3_medium_lora_r4
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.