Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis

Yixiao Zhang; Haonan Chen; Ju-Chiang Wang; Jitong Chen

arXiv:2507.13572·cs.SD·July 21, 2025

Temporal Adaptation of Pre-trained Foundation Models for Music Structure Analysis

Yixiao Zhang, Haonan Chen, Ju-Chiang Wang, Jitong Chen

PDF

Open Access

TL;DR

This paper introduces a temporal adaptation method for pre-trained music models that improves music structure analysis of long audio tracks efficiently, with better boundary detection and structural prediction.

Contribution

It proposes a novel temporal adaptation approach that enables efficient full-length song analysis by extending audio windows and using low-resolution adaptation.

Findings

01

Improved boundary detection accuracy

02

Enhanced structural function prediction

03

Maintained inference speed and memory efficiency

Abstract

Audio-based music structure analysis (MSA) is an essential task in Music Information Retrieval that remains challenging due to the complexity and variability of musical form. Recent advances highlight the potential of fine-tuning pre-trained music foundation models for MSA tasks. However, these models are typically trained with high temporal feature resolution and short audio windows, which limits their efficiency and introduces bias when applied to long-form audio. This paper presents a temporal adaptation approach for fine-tuning music foundation models tailored to MSA. Our method enables efficient analysis of full-length songs in a single forward pass by incorporating two key strategies: (1) audio window extension and (2) low-resolution adaptation. Experiments on the Harmonix Set and RWC-Pop datasets show that our method significantly improves both boundary detection and structural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception