ARIA: A Diagnostic Framework for Music Training Data Attribution
Changheon Han, Ashkan Panahi, K{\i}van\c{c} Tatar

TL;DR
ARIA is a comprehensive framework for music data attribution that decomposes influence into musical aspects and assesses the reliability of attribution methods, enhancing understanding of how generated music relates to training data.
Contribution
It introduces a novel decomposition approach for attribution along musical aspects and provides diagnostics to evaluate attribution reliability, advancing music copyright analysis tools.
Findings
Reliability diagnostics rank four attribution methods accurately against ground truth.
ARIA reveals significant variation in attribution behaviors across methods.
It characterizes embedding-similarity retrieval baselines by musical aspect.
Abstract
Training data attribution (TDA) for music generation must answer two questions that copyright analysis requires, namely which training songs influence a generated output and along which musical aspects the influence operates. Existing methods reduce influence to a single scalar, without revealing which musical aspects are dominant in that influence. We propose ARIA, a framework that decomposes attribution along musical aspects (five for symbolic music, three for audio) and pairs the decomposition with reliability diagnostics computed from the segment-level score matrix. It measures within-group similarity among the top-K attributed tracks against random reference groups drawn from the training pool, and diagnoses the score matrix through its singular value decomposition and column statistics. On a symbolic-music model where attribution ground truth is available through counterfactual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
