Deep convolutional networks on the pitch spiral for musical instrument   recognition

Vincent Lostanlen; Carmine-Emanuele Cella

arXiv:1605.06644·cs.SD·January 11, 2017·33 cites

Deep convolutional networks on the pitch spiral for musical instrument recognition

Vincent Lostanlen, Carmine-Emanuele Cella

PDF

Open Access 1 Repo

TL;DR

This paper explores convolutional neural network architectures for musical instrument recognition, focusing on invariance to pitch and expressive variations, and introduces a hybrid approach that combines multiple weight sharing strategies for improved accuracy.

Contribution

It benchmarks three novel weight sharing strategies in deep CNNs for instrument recognition and demonstrates that hybridizing these strategies yields the best classification performance.

Findings

01

Hybrid architecture outperforms individual strategies.

02

Shepard pitch spiral inspired kernel combination improves invariance.

03

Deep learning models can effectively classify instruments with limited data.

Abstract

Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lostanlen/ismir2016
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing