Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music   Transcription

Frank Cwitkowitz; Kin Wai Cheuk; Woosung Choi; Marco A.; Mart\'inez-Ram\'irez; Keisuke Toyama; Wei-Hsiang Liao; Yuki Mitsufuji

arXiv:2309.15717·eess.AS·January 25, 2024

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription

Frank Cwitkowitz, Kin Wai Cheuk, Woosung Choi, Marco A., Mart\'inez-Ram\'irez, Keisuke Toyama, Wei-Hsiang Liao, Yuki Mitsufuji

PDF

Open Access

TL;DR

Timbre-Trap is a low-resource, instrument-agnostic music transcription framework that leverages pitch-timbre separation via an autoencoder, achieving competitive results with minimal annotated data.

Contribution

It introduces a novel autoencoder-based framework that unifies music transcription and audio reconstruction, effectively handling low-resource, multi-instrument scenarios.

Findings

01

Achieves performance comparable to state-of-the-art methods

02

Requires significantly less annotated data

03

Successfully separates pitch from timbre in complex audio

Abstract

In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but these methods face the same data availability issues. We propose Timbre-Trap, a novel framework which unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients, selecting between either output during the decoding stage via a simple switch mechanism. In this way, the model learns to produce coefficients corresponding to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · Convolution · U-Net