TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic   Music

Ke Chen; Shuai Yu; Cheng-i Wang; Wei Li; Taylor Berg-Kirkpatrick,; Shlomo Dubnov

arXiv:2202.00951·eess.AS·February 3, 2022

TONet: Tone-Octave Network for Singing Melody Extraction from Polyphonic Music

Ke Chen, Shuai Yu, Cheng-i Wang, Wei Li, Taylor Berg-Kirkpatrick,, Shlomo Dubnov

PDF

Open Access 1 Repo

TL;DR

TONet is a novel neural network architecture that enhances singing melody extraction by explicitly modeling tone and octave information through a specialized input representation and fusion mechanism, outperforming existing methods.

Contribution

The paper introduces TONet, a plug-and-play model with a new input representation and fusion mechanism that significantly improves tone and octave perception in singing melody extraction.

Findings

01

Substantial improvements in octave and tone accuracy across datasets.

02

Effective use of Tone-CFP input representation for harmonic grouping.

03

Enhanced melody extraction performance with various backbone models.

Abstract

Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. First, we present an improved input representation, the Tone-CFP, that explicitly groups harmonics via a rearrangement of frequency-bins. Second, we introduce an encoder-decoder architecture that is designed to obtain a salience feature map, a tone feature map, and an octave feature map. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

retrocirce/tonet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies