Deep Embeddings for Robust User-Based Amateur Vocal Percussion Classification
Alejandro Delgado, Emir Demirel, Vinod Subramanian, Charalampos, Saitis, and Mark Sandler

TL;DR
This paper investigates deep learning strategies to improve vocal percussion classification by learning informative feature embeddings, demonstrating that syllable-level supervision yields the most effective representations for amateur users.
Contribution
It introduces a deep supervised learning approach with multiple label levels, showing that syllable-level supervision produces optimal embeddings for vocal percussion classification.
Findings
Convolutional neural networks with syllable-level labels outperform baseline methods.
Syllable-level supervision yields the most informative feature embeddings.
Saliency maps reveal spectrogram regions crucial for feature learning.
Abstract
Vocal Percussion Transcription (VPT) is concerned with the automatic detection and classification of vocal percussion sound events, allowing music creators and producers to sketch drum lines on the fly. Classifier algorithms in VPT systems learn best from small user-specific datasets, which usually restrict modelling to small input feature sets to avoid data overfitting. This study explores several deep supervised learning strategies to obtain informative feature sets for amateur vocal percussion classification. We evaluated the performance of these sets on regular vocal percussion classification tasks and compared them with several baseline approaches including feature selection methods and a speech recognition engine. These proposed learning models were supervised with several label sets containing information from four different levels of abstraction: instrument-level,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsFeature Selection
