Learning long-term music representations via hierarchical contextual constraints
Shiqi Wei, Gus Xia

TL;DR
This paper introduces a novel hierarchical contextual constraint method for learning stable, long-term symbolic music representations, improving reconstruction and disentanglement over previous models.
Contribution
It proposes a contrastive pre-training and hierarchical fine-tuning approach that stabilizes training and enhances long-term music representation quality.
Findings
Stabilizes training of long-term music representations
Improves reconstruction accuracy of hierarchical music segments
Enhances disentanglement of musical features
Abstract
Learning symbolic music representations, especially disentangled representations with probabilistic interpretations, has been shown to benefit both music understanding and generation. However, most models are only applicable to short-term music, while learning long-term music representations remains a challenging task. We have seen several studies attempting to learn hierarchical representations directly in an end-to-end manner, but these models have not been able to achieve the desired results and the training process is not stable. In this paper, we propose a novel approach to learn long-term symbolic music representations through contextual constraints. First, we use contrastive learning to pre-train a long-term representation by constraining its difference from the short-term representation (extracted by an off-the-shelf model). Then, we fine-tune the long-term representation by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
MethodsContrastive Learning
