EDMFormer: Genre-Specific Self-Supervised Learning for Music Structure Segmentation
Sahal Sajeer, Krish Patel, Oscar Chung, Joel Song Bae

TL;DR
EDMFormer is a transformer-based self-supervised model tailored for music structure segmentation in EDM, leveraging genre-specific data and priors to improve boundary detection and section labeling.
Contribution
The paper introduces EDMFormer, a novel transformer model trained on EDM-specific data, enhancing music structure segmentation accuracy for EDM tracks.
Findings
Improved boundary detection in EDM music.
Enhanced section labeling, especially for drops and buildups.
Effective use of genre-specific data and priors.
Abstract
Music structure segmentation is a key task in audio analysis, but existing models perform poorly on Electronic Dance Music (EDM). This problem exists because most approaches rely on lyrical or harmonic similarity, which works well for pop music but not for EDM. EDM structure is instead defined by changes in energy, rhythm, and timbre, with different sections such as buildup, drop, and breakdown. We introduce EDMFormer, a transformer model that combines self-supervised audio embeddings using an EDM-specific dataset and taxonomy. We release this dataset as EDM-98: a group of 98 professionally annotated EDM tracks. EDMFormer improves boundary detection and section labelling compared to existing models, particularly for drops and buildups. The results suggest that combining learned representations with genre-specific data and structural priors is effective for EDM and could be applied to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
