Training data attribution in diffusion models via mirrored unlearning and noise-consistent skew
Joan Serr\`a, Dipam Goswami, Fabio Morreale, Wei-Hsiang Liao, Yuki Mitsufuji

TL;DR
This paper introduces MUCS, a novel method for training data attribution in diffusion models, which improves robustness and reliability over existing approaches by using mirrored unlearning and noise-consistent skew.
Contribution
The paper proposes MUCS, a simple, generic technique that significantly enhances training data attribution accuracy in diffusion models through mirrored unlearning and skew measurement.
Findings
MUCS outperforms existing TDA methods on three datasets.
Design choices significantly impact TDA performance.
Ensembling TDA approaches can improve attribution robustness.
Abstract
Training data attribution (TDA) should enable generative model interpretability and foster a variety of related downstream tasks. Nonetheless, current TDA approaches lack reliability and robustness, preventing their adoption in real-world setups. In this paper, we take a decisive step towards more reliable and robust TDA for diffusion models. We propose to perform TDA with mirrored unlearning and noise-consistent skew (MUCS). The idea is to fine-tune a second model with bounded mirrored gradient ascent, and to measure the normalized skew of this model with respect to the original one using consistent noise samples. We show that, while being conceptually simple and generic, MUCS systematically outperforms existing methods on three different datasets by a large margin. We additionally study the effect that core design choices have on final performance, and analyze novel aspects regarding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
