To catch a chorus, verse, intro, or anything else: Analyzing a song with   structural functions

Ju-Chiang Wang; Yun-Ning Hung; Jordan B. L. Smith

arXiv:2205.14700·eess.AS·May 31, 2022·1 cites

To catch a chorus, verse, intro, or anything else: Analyzing a song with structural functions

Ju-Chiang Wang, Yun-Ning Hung, Jordan B. L. Smith

PDF

Open Access 1 Datasets

TL;DR

This paper presents a multi-task deep learning framework using a spectral-temporal Transformer model to identify musical structural functions like verse and chorus directly from audio, outperforming existing methods.

Contribution

It introduces a novel 7-class taxonomy for song segments, consolidates multiple datasets, and employs a Transformer-based model with a new CTL loss for improved structural analysis.

Findings

01

Outperforms state-of-the-art chorus detection methods.

02

Achieves strong boundary detection results.

03

Effective cross-dataset generalization.

Abstract

Conventional music structure analysis algorithms aim to divide a song into segments and to group them with abstract labels (e.g., 'A', 'B', and 'C'). However, explicitly identifying the function of each segment (e.g., 'verse' or 'chorus') is rarely attempted, but has many applications. We introduce a multi-task deep learning framework to model these structural semantic labels directly from audio by estimating "verseness," "chorusness," and so forth, as a function of time. We propose a 7-class taxonomy (i.e., intro, verse, chorus, bridge, outro, instrumental, and silence) and provide rules to consolidate annotations from four disparate datasets. We also propose to use a spectral-temporal Transformer-based model, called SpecTNT, which can be trained with an additional connectionist temporal localization (CTL) loss. In cross-dataset evaluations using four public datasets, we demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ASLP-lab/SongFormBench
dataset· 1.3k dl
1.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies