Use of speaker recognition approaches for learning and evaluating   embedding representations of musical instrument sounds

Xuan Shi; Erica Cooper; Junichi Yamagishi

arXiv:2107.11506·eess.AS·December 28, 2021

Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds

Xuan Shi, Erica Cooper, Junichi Yamagishi

PDF

Open Access 1 Repo

TL;DR

This paper adapts speaker recognition techniques to learn and evaluate embedding spaces for musical instrument sounds, enabling recognition of unseen instruments for music synthesis applications.

Contribution

It introduces a novel approach using ASV architectures and evaluation methods for musical instrument sound embeddings, demonstrating effectiveness on multiple datasets.

Findings

01

Effective recognition of unseen instruments via EER metrics

02

Data augmentation and angular softmax improve embedding quality

03

Multi-task learning with instrument family labels enhances embedding structure

Abstract

Constructing an embedding space for musical instrument sounds that can meaningfully represent new and unseen instruments is important for downstream music generation tasks such as multi-instrument synthesis and timbre transfer. The framework of Automatic Speaker Verification (ASV) provides us with architectures and evaluation methodologies for verifying the identities of unseen speakers, and these can be repurposed for the task of learning and evaluating a musical instrument sound embedding space that can support unseen instruments. Borrowing from state-of-the-art ASV techniques, we construct a musical instrument recognition model that uses a SincNet front-end, a ResNet architecture, and an angular softmax objective function. Experiments on the NSynth and RWC datasets show our model's effectiveness in terms of equal error rate (EER) for unseen instruments, and ablation studies show the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Alexuan/musical_instrument_embedding
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Average Pooling · 1x1 Convolution · Residual Connection · Convolution · Batch Normalization · Global Average Pooling · Max Pooling · Bottleneck Residual Block · Kaiming Initialization