Self-Supervised Hierarchical Metrical Structure Modeling
Junyan Jiang, Gus Xia

TL;DR
This paper introduces a self-supervised approach to model hierarchical metrical structures in music, capable of analyzing both symbolic and audio signals without extensive labeled data, achieving performance comparable to supervised methods.
Contribution
The novel method models hierarchical metrical structures using minimal domain knowledge and self-supervision, applicable to both symbolic and audio music signals.
Findings
Achieves comparable performance with supervised methods
Works on both symbolic music and audio signals
Requires no hierarchical labels beyond beat annotations
Abstract
We propose a novel method to model hierarchical metrical structures for both symbolic music and audio signals in a self-supervised manner with minimal domain knowledge. The model trains and inferences on beat-aligned music signals and predicts an 8-layer hierarchical metrical tree from beat, measure to the section level. The training procedure does not require any hierarchical metrical labeling except for beats, purely relying on the nature of metrical regularity and inter-voice consistency as inductive biases. We show in experiments that the method achieves comparable performance with supervised baselines on multiple metrical structure analysis tasks on both symbolic music and audio signals. All demos, source code and pre-trained models are publicly available on GitHub.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
