Representation Learning for the Automatic Indexing of Sound Effects Libraries
Alison B. Ma, Alexander Lerch

TL;DR
This paper proposes a dataset-independent, taxonomy-agnostic representation learning method for sound effects libraries, improving search and categorization despite inconsistent metadata and limited data, outperforming existing methods like OpenL3.
Contribution
It introduces a novel, generalized embedding approach for sound effects that overcomes dataset limitations and taxonomy issues, enhancing sound library management.
Findings
Dataset-independent embeddings outperform OpenL3
Metric learning improves representation quality
Cross-dataset training enhances generalization
Abstract
Labeling and maintaining a commercial sound effects library is a time-consuming task exacerbated by databases that continually grow in size and undergo taxonomy updates. Moreover, sound search and taxonomy creation are complicated by non-uniform metadata, an unrelenting problem even with the introduction of a new industry standard, the Universal Category System. To address these problems and overcome dataset-dependent limitations that inhibit the successful training of deep learning models, we pursue representation learning to train generalized embeddings that can be used for a wide variety of sound effects libraries and are a taxonomy-agnostic representation of sound. We show that a task-specific but dataset-independent representation can successfully address data issues such as class imbalance, inconsistent class labels, and insufficient dataset size, outperforming established…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
MethodsLib
