Deep Conditional Representation Learning for Drum Sample Retrieval by   Vocalisation

Alejandro Delgado; Charalampos Saitis; Emmanouil Benetos; and Mark; Sandler

arXiv:2204.04651·cs.SD·April 12, 2022

Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

Alejandro Delgado, Charalampos Saitis, Emmanouil Benetos, and Mark, Sandler

PDF

Open Access 1 Repo

TL;DR

This paper explores the use of conditional autoencoders to improve drum sample retrieval through vocal imitation, evaluating their effectiveness with acoustic and perceptual metrics and analyzing individual differences in vocal styles.

Contribution

It introduces a novel approach using conditioned autoencoders for drum sample retrieval by vocalisation, demonstrating improved embedding quality and highlighting user-specific variations.

Findings

01

Conditioned models on sound-type and drum-type labels learn more informative embeddings.

02

Embeddings correlate with both acoustic and perceptual similarity measures.

03

Significant individual differences in vocal imitation styles affect retrieval performance.

Abstract

Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities. For this reason, there is an increasing interest in building applications that allow artists to efficiently pick target samples from big sound libraries just by imitating them vocally. In this study, we investigated the potential of conditional autoencoder models to learn informative features for Drum Sample Retrieval by Vocalisation (DSRV). We assessed the usefulness of their embeddings using four evaluation metrics, two of them relative to their acoustic properties and two of them relative to their perceptual properties via human listeners' similarity ratings. Results suggest that models conditioned on both sound-type labels (drum vs imitation) and drum-type labels (kick vs snare vs closed hi-hat vs opened…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alejandrodl/drum-sample-retrieval-vocalisation
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing