Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation
Alejandro Delgado, Charalampos Saitis, Emmanouil Benetos, and Mark, Sandler

TL;DR
This paper explores the use of conditional autoencoders to improve drum sample retrieval through vocal imitation, evaluating their effectiveness with acoustic and perceptual metrics and analyzing individual differences in vocal styles.
Contribution
It introduces a novel approach using conditioned autoencoders for drum sample retrieval by vocalisation, demonstrating improved embedding quality and highlighting user-specific variations.
Findings
Conditioned models on sound-type and drum-type labels learn more informative embeddings.
Embeddings correlate with both acoustic and perceptual similarity measures.
Significant individual differences in vocal imitation styles affect retrieval performance.
Abstract
Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities. For this reason, there is an increasing interest in building applications that allow artists to efficiently pick target samples from big sound libraries just by imitating them vocally. In this study, we investigated the potential of conditional autoencoder models to learn informative features for Drum Sample Retrieval by Vocalisation (DSRV). We assessed the usefulness of their embeddings using four evaluation metrics, two of them relative to their acoustic properties and two of them relative to their perceptual properties via human listeners' similarity ratings. Results suggest that models conditioned on both sound-type labels (drum vs imitation) and drum-type labels (kick vs snare vs closed hi-hat vs opened…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
