Membership and Dataset Inference Attacks on Large Audio Generative Models
Jakub Proboszcz, Pawe{\l} Kochanski, Karol Korszun, Donato Crisostomi, Giorgio Strano, Emanuele Rodol\`a, Kamil Deja, Jan Dubinski

TL;DR
This paper explores the potential for verifying if specific audio works were used in training large generative models, finding dataset inference to be a promising tool for copyright protection despite limited success of individual membership inference.
Contribution
It demonstrates that dataset inference can effectively determine if an artist's collection was included in training data for large audio models, advancing copyright protection methods.
Findings
Membership inference is limited at scale due to weak signals.
Dataset inference successfully aggregates evidence across multiple samples.
Dataset inference offers a practical approach for copyright accountability.
Abstract
Generative audio models, based on diffusion and autoregressive architectures, have advanced rapidly in both quality and expressiveness. This progress, however, raises pressing copyright concerns, as such models are often trained on vast corpora of artistic and commercial works. A central question is whether one can reliably verify if an artist's material was included in training, thereby providing a means for copyright holders to protect their content. In this work, we investigate the feasibility of such verification through membership inference attacks (MIA) on open-source generative audio models, which attempt to determine whether a specific audio sample was part of the training set. Our empirical results show that membership inference alone is of limited effectiveness at scale, as the per-sample membership signal is weak for models trained on large and diverse datasets. However,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies
