BYOL for Audio: Self-Supervised Learning for General-Purpose Audio   Representation

Daisuke Niizumi; Daiki Takeuchi; Yasunori Ohishi; Noboru Harada; and; Kunio Kashino

arXiv:2103.06695·eess.AS·April 22, 2021

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, and, Kunio Kashino

PDF

3 Repos

TL;DR

This paper introduces BYOL-A, a self-supervised learning method for general-purpose audio representation that learns from single audio segments without relying on segment relationships, achieving state-of-the-art results.

Contribution

It presents a novel BYOL-based approach for audio that does not depend on segment relationships, expanding self-supervised learning to broader audio applications.

Findings

01

Achieves state-of-the-art results in various audio tasks.

02

Effective in learning from a single audio segment without segment relationships.

03

Component ablations clarify the importance of each method part.

Abstract

Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning approach. We propose learning general-purpose audio representation from a single audio segment without expecting relationships between different time segments of audio samples. To implement this principle, we introduce Bootstrap Your Own Latent (BYOL) for Audio (BYOL-A, pronounced "viola"), an audio self-supervised learning method based on BYOL for learning general-purpose audio representation. Unlike most previous audio self-supervised learning methods that rely on agreement of vicinity audio segments or disagreement of remote ones, BYOL-A creates contrasts in an augmented audio segment pair derived from a single audio segment. With a combination of normalization and augmentation techniques,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMixup · Bootstrap Your Own Latent