Training Data Attribution for Diffusion Models
Zheng Dai, David K Gifford

TL;DR
This paper introduces a novel ensemble-based method to attribute and assess the influence of training data on the outputs of diffusion models, enhancing explainability and understanding of training data impact.
Contribution
The paper proposes a new ensemble approach that enables efficient attribution of training data influence on diffusion model outputs, addressing explainability challenges.
Findings
Ensembles can effectively identify influential training examples.
The approach allows for impact assessment of training data on generated samples.
Ensembles serve as valid generative models.
Abstract
Diffusion models have become increasingly popular for synthesizing high-quality samples based on training datasets. However, given the oftentimes enormous sizes of the training datasets, it is difficult to assess how training data impact the samples produced by a trained diffusion model. The difficulty of relating diffusion model inputs and outputs poses significant challenges to model explainability and training data attribution. Here we propose a novel solution that reveals how training data influence the output of diffusion models through the use of ensembles. In our approach individual models in an encoded ensemble are trained on carefully engineered splits of the overall training data to permit the identification of influential training examples. The resulting model ensembles enable efficient ablation of training data influence, allowing us to assess the impact of training data on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Gaussian Processes and Bayesian Inference
MethodsDiffusion
