Large-Scale Training Data Attribution for Music Generative Models via Unlearning
Woosung Choi, Junghyun Koo, Kin Wai Cheuk, Joan Serr\`a, Marco A. Mart\'inez-Ram\'irez, Yukara Ikemiya, Naoki Murata, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

TL;DR
This paper introduces unlearning-based training data attribution for music generative models, enabling identification of influential training data points to promote fairness and accountability in AI-generated music.
Contribution
It adapts unlearning methods for training data attribution to large-scale music generative models, addressing ethical and copyright concerns.
Findings
Unlearning methods can be effectively applied to music models.
The approach provides consistent attribution patterns.
It advances ethical AI practices in music generation.
Abstract
This paper explores the use of unlearning methods for training data attribution (TDA) in music generative models trained on large-scale datasets. TDA aims to identify which specific training data points contributed the most to the generation of a particular output from a specific model. This is crucial in the context of AI-generated music, where proper recognition and credit for original artists are generally overlooked. By enabling white-box attribution, our work supports a fairer system for acknowledging artistic contributions and addresses pressing concerns related to AI ethics and copyright. We apply unlearning-based attribution to a text-to-music diffusion model trained on a large-scale dataset and investigate its feasibility and behavior in this setting. To validate the method, we perform a grid search over different hyperparameter configurations and quantitatively evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
