Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data
Amir Shirian, Krishna Somandepalli, Tanaya Guha

TL;DR
This paper introduces a self-supervised graph-based framework for audio representation learning that effectively utilizes limited labeled data, outperforming some fully supervised models and demonstrating robustness across tasks.
Contribution
The paper proposes a novel subgraph-based self-supervised learning approach for audio, leveraging relationships between labeled and unlabeled data to improve representation quality.
Findings
Outperforms several existing models on benchmark tasks.
Produces robust audio representations with limited labeled data.
Model is compact with 240k parameters.
Abstract
Large scale databases with high-quality manual annotations are scarce in audio domain. We thus explore a self-supervised graph approach to learning audio representations from highly limited labelled data. Considering each audio sample as a graph node, we propose a subgraph-based framework with novel self-supervision tasks that can learn effective audio representations. During training, subgraphs are constructed by sampling the entire pool of available training data to exploit the relationship between the labelled and unlabeled audio samples. During inference, we use random edges to alleviate the overhead of graph construction. We evaluate our model on three benchmark audio databases, and two tasks: acoustic event detection and speech emotion recognition. Our semi-supervised model performs better or on par with fully supervised models and outperforms several competitive existing models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
