Invariances and Data Augmentation for Supervised Music Transcription

John Thickstun; Zaid Harchaoui; Dean Foster; Sham M. Kakade

arXiv:1711.04845·stat.ML·November 15, 2017

Invariances and Data Augmentation for Supervised Music Transcription

John Thickstun, Zaid Harchaoui, Dean Foster, Sham M. Kakade

PDF

1 Repo

TL;DR

This paper investigates invariance and data augmentation techniques in supervised music transcription, demonstrating a translation-invariant neural network that achieves state-of-the-art results on human recordings.

Contribution

It introduces a translation-invariant model combining filterbanks and CNNs, leveraging frequency invariance and label-preserving augmentations for improved transcription.

Findings

01

Top-performing model in 2017 MIREX evaluation

02

Reduced model parameters through frequency invariance

03

Effective use of pitch-shift data augmentation

Abstract

This paper explores a variety of models for frame-based music transcription, with an emphasis on the methods needed to reach state-of-the-art on human recordings. The translation-invariant network discussed in this paper, which combines a traditional filterbank with a convolutional neural network, was the top-performing model in the 2017 MIREX Multiple Fundamental Frequency Estimation evaluation. This class of models shares parameters in the log-frequency domain, which exploits the frequency invariance of music to reduce the number of model parameters and avoid overfitting to the training data. All models in this paper were trained with supervision by labeled data from the MusicNet dataset, augmented by random label-preserving pitch-shift transformations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jthickstun/thickstun2018invariances
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.