SSM-Net: feature learning for Music Structure Analysis using a Self-Similarity-Matrix based loss
Geoffroy Peeters, Florian Angulo

TL;DR
This paper introduces SSM-Net, a deep learning approach that learns audio features for Music Structure Analysis by aligning the self-similarity matrices of learned features with ground-truth matrices, improving analysis accuracy.
Contribution
The paper presents a novel training paradigm for audio feature learning using a differentiable SSM-based loss, enabling more effective music structure analysis.
Findings
Achieved high AUC scores on RWC-Pop dataset
Demonstrated the effectiveness of SSM-based loss in feature learning
Showed improved music structure analysis performance
Abstract
In this paper, we propose a new paradigm to learn audio features for Music Structure Analysis (MSA). We train a deep encoder to learn features such that the Self-Similarity-Matrix (SSM) resulting from those approximates a ground-truth SSM. This is done by minimizing a loss between both SSMs. Since this loss is differentiable w.r.t. its input features we can train the encoder in a straightforward way. We successfully demonstrate the use of this training paradigm using the Area Under the Curve ROC (AUC) on the RWC-Pop dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
