Vocal melody extraction using patch-based CNN

Li Su

arXiv:1804.09202·cs.SD·April 26, 2018·1 cites

Vocal melody extraction using patch-based CNN

Li Su

PDF

Open Access 3 Repos

TL;DR

This paper introduces a patch-based CNN model for vocal melody extraction that uses a novel time-frequency representation, achieving efficient training and competitive accuracy in polyphonic music analysis.

Contribution

The paper presents a new CNN architecture and data representation for vocal melody extraction, inspired by object detection techniques in image processing.

Findings

01

Achieves high speed in melody extraction

02

Demonstrates competitive accuracy with limited labeled data

03

Effective in polyphonic music environments

Abstract

A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies