Feature-informed Embedding Space Regularization For Audio Classification
Yun-Ning Hung, Alexander Lerch

TL;DR
This paper introduces two regularization methods that combine task-specific and pre-trained features to improve audio classification performance while reducing inference complexity.
Contribution
It proposes novel regularization techniques that leverage both detailed task-specific and generic pre-trained features, enhancing audio classification accuracy.
Findings
Proposed methods outperform baseline models.
Using combined features yields better results than individual features.
Improved state-of-the-art performance on multiple audio tasks.
Abstract
Feature representations derived from models pre-trained on large-scale datasets have shown their generalizability on a variety of audio analysis tasks. Despite this generalizability, however, task-specific features can outperform if sufficient training data is available, as specific task-relevant properties can be learned. Furthermore, the complex pre-trained models bring considerable computational burdens during inference. We propose to leverage both detailed task-specific features from spectrogram input and generic pre-trained features by introducing two regularization methods that integrate the information of both feature classes. The workload is kept low during inference as the pre-trained features are only necessary for training. In experiments with the pre-trained features VGGish, OpenL3, and a combination of both, we show that the proposed methods not only outperform baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
