Motif-Centric Representation Learning for Symbolic Music
Yuxuan Wu, Roger B. Dannenberg, Gus Xia

TL;DR
This paper introduces a novel motif-centric representation learning approach for symbolic music using Siamese networks, improving motif retrieval accuracy and visualizing music structure.
Contribution
It presents a new method combining VICReg pretraining and contrastive fine-tuning to better capture implicit motif relationships in music.
Findings
12.6% improvement in retrieval performance
Effective visualization of motif representations
Advances computational modeling of music motifs
Abstract
Music motif, as a conceptual building block of composition, is crucial for music structure analysis and automatic composition. While human listeners can identify motifs easily, existing computational models fall short in representing motifs and their developments. The reason is that the nature of motifs is implicit, and the diversity of motif variations extends beyond simple repetitions and modulations. In this study, we aim to learn the implicit relationship between motifs and their variations via representation learning, using the Siamese network architecture and a pretraining and fine-tuning pipeline. A regularization-based method, VICReg, is adopted for pretraining, while contrastive learning is used for fine-tuning. Experimental results on a retrieval-based task show that these two methods complement each other, yielding an improvement of 12.6% in the area under the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsContrastive Learning · Siamese Network
