Self-supervised Graphs for Audio Representation Learning with Limited   Labeled Data

Amir Shirian; Krishna Somandepalli; Tanaya Guha

arXiv:2202.00097·cs.LG·November 23, 2022·1 cites

Self-supervised Graphs for Audio Representation Learning with Limited Labeled Data

Amir Shirian, Krishna Somandepalli, Tanaya Guha

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised graph-based framework for audio representation learning that effectively utilizes limited labeled data, outperforming some fully supervised models and demonstrating robustness across tasks.

Contribution

The paper proposes a novel subgraph-based self-supervised learning approach for audio, leveraging relationships between labeled and unlabeled data to improve representation quality.

Findings

01

Outperforms several existing models on benchmark tasks.

02

Produces robust audio representations with limited labeled data.

03

Model is compact with 240k parameters.

Abstract

Large scale databases with high-quality manual annotations are scarce in audio domain. We thus explore a self-supervised graph approach to learning audio representations from highly limited labelled data. Considering each audio sample as a graph node, we propose a subgraph-based framework with novel self-supervision tasks that can learn effective audio representations. During training, subgraphs are constructed by sampling the entire pool of available training data to exploit the relationship between the labelled and unlabeled audio samples. During inference, we use random edges to alleviate the overhead of graph construction. We evaluate our model on three benchmark audio databases, and two tasks: acoustic event detection and speech emotion recognition. Our semi-supervised model performs better or on par with fully supervised models and outperforms several competitive existing models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AmirSh15/SSL_graph_audio
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis