A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation

Chun-wei Ho; Sabato Marco Siniscalchi; Kai Li; Chin-Hui Lee

arXiv:2602.21476·eess.AS·February 26, 2026

A Knowledge-Driven Approach to Music Segmentation, Music Source Separation and Cinematic Audio Source Separation

Chun-wei Ho, Sabato Marco Siniscalchi, Kai Li, Chin-Hui Lee

PDF

Open Access

TL;DR

This paper introduces a knowledge-driven, model-based framework for audio segmentation and source separation that leverages music scores and does not require pre-segmented training data, demonstrating improved results in music and cinematic audio separation.

Contribution

The proposed approach uniquely combines knowledge sources like music scores with model-based methods, eliminating the need for annotated training data for audio segmentation and separation.

Findings

01

Score-guided learning achieves high-quality music segmentation.

02

Utilizing sound category knowledge improves cinematic audio source separation.

03

Method outperforms data-driven techniques without prior annotations.

Abstract

We propose a knowledge-driven, model-based approach to segmenting audio into single-category and mixed-category chunks with applications to source separation. "Knowledge" here denotes information associated with the data, such as music scores. "Model" here refers to tool that can be used for audio segmentation and recognition, such as hidden Markov models. In contrast to conventional learning that often relies on annotated data with given segment categories and their corresponding boundaries to guide the learning process, the proposed framework does not depend on any pre-segmented training data and learns directly from the input audio and its related knowledge sources to build all necessary models autonomously. Evaluation on simulation data shows that score-guided learning achieves very good music segmentation and separation results. Tested on movie track data for cinematic audio source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis