Deep convolutional networks on the pitch spiral for musical instrument recognition
Vincent Lostanlen, Carmine-Emanuele Cella

TL;DR
This paper explores convolutional neural network architectures for musical instrument recognition, focusing on invariance to pitch and expressive variations, and introduces a hybrid approach that combines multiple weight sharing strategies for improved accuracy.
Contribution
It benchmarks three novel weight sharing strategies in deep CNNs for instrument recognition and demonstrates that hybridizing these strategies yields the best classification performance.
Findings
Hybrid architecture outperforms individual strategies.
Shepard pitch spiral inspired kernel combination improves invariance.
Deep learning models can effectively classify instruments with limited data.
Abstract
Musical performance combines a wide range of pitches, nuances, and expressive techniques. Audio-based classification of musical instruments thus requires to build signal representations that are invariant to such transformations. This article investigates the construction of learned convolutional architectures for instrument recognition, given a limited amount of annotated training data. In this context, we benchmark three different weight sharing strategies for deep convolutional networks in the time-frequency domain: temporal kernels; time-frequency kernels; and a linear combination of time-frequency kernels which are one octave apart, akin to a Shepard pitch spiral. We provide an acoustical interpretation of these strategies within the source-filter framework of quasi-harmonic sounds with a fixed spectral envelope, which are archetypal of musical notes. The best classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
