Exploring single-song autoencoding schemes for audio-based music structure analysis
Axel Marmoret, J\'er\'emy E. Cohen, Fr\'ed\'eric Bimbot

TL;DR
This paper introduces a song-specific autoencoding approach for music structure analysis that learns from unlabeled data and achieves comparable performance to supervised methods using only a few seconds of tolerance.
Contribution
It proposes a novel unsupervised, piece-specific autoencoding scheme that does not require annotations, enabling effective music structure inference.
Findings
Achieves state-of-the-art performance with 3 seconds tolerance on RWC-Pop dataset.
Does not rely on supervision or annotations, reducing data collection effort.
Performs comparably to supervised methods in music structure analysis.
Abstract
The ability of deep neural networks to learn complex data relations and representations is established nowadays, but it generally relies on large sets of training data. This work explores a "piece-specific" autoencoding scheme, in which a low-dimensional autoencoder is trained to learn a latent/compressed representation specific to a given song, which can then be used to infer the song structure. Such a model does not rely on supervision nor annotations, which are well-known to be tedious to collect and often ambiguous in Music Structure Analysis. We report that the proposed unsupervised auto-encoding scheme achieves the level of performance of supervised state-of-the-art methods with 3 seconds tolerance when using a Log Mel spectrogram representation on the RWC-Pop dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
