Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
Frank Cwitkowitz, Kin Wai Cheuk, Woosung Choi, Marco A., Mart\'inez-Ram\'irez, Keisuke Toyama, Wei-Hsiang Liao, Yuki Mitsufuji

TL;DR
Timbre-Trap is a low-resource, instrument-agnostic music transcription framework that leverages pitch-timbre separation via an autoencoder, achieving competitive results with minimal annotated data.
Contribution
It introduces a novel autoencoder-based framework that unifies music transcription and audio reconstruction, effectively handling low-resource, multi-instrument scenarios.
Findings
Achieves performance comparable to state-of-the-art methods
Requires significantly less annotated data
Successfully separates pitch from timbre in complex audio
Abstract
In recent years, research on music transcription has focused mainly on architecture design and instrument-specific data acquisition. With the lack of availability of diverse datasets, progress is often limited to solo-instrument tasks such as piano transcription. Several works have explored multi-instrument transcription as a means to bolster the performance of models on low-resource tasks, but these methods face the same data availability issues. We propose Timbre-Trap, a novel framework which unifies music transcription and audio reconstruction by exploiting the strong separability between pitch and timbre. We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients, selecting between either output during the decoding stage via a simple switch mechanism. In this way, the model learns to produce coefficients corresponding to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · Convolution · U-Net
