Autodecompose: A generative self-supervised model for semantic   decomposition

Mohammad Reza Bonyadi

arXiv:2302.03124·cs.LG·February 14, 2023

Autodecompose: A generative self-supervised model for semantic decomposition

Mohammad Reza Bonyadi

PDF

Open Access 1 Repo

TL;DR

Autodecompose is a self-supervised generative model that disentangles data into semantic properties like sound source and content, enabling high-accuracy speaker recognition with minimal labeled data.

Contribution

It introduces a novel self-supervised approach for semantic decomposition using dual augmentations and encoders, without requiring labels, and demonstrates superior performance in audio source recognition.

Findings

01

Achieves 97.6% F1 in speaker recognition with only 10 seconds of labeled data.

02

Pre-trained on small datasets, it surpasses supervised models and resists overfitting.

03

Embeds content information separately, ignoring sound source in the context encoder.

Abstract

We introduce Autodecompose, a novel self-supervised generative model that decomposes data into two semantically independent properties: the desired property, which captures a specific aspect of the data (e.g. the voice in an audio signal), and the context property, which aggregates all other information (e.g. the content of the audio signal), without any labels given. Autodecompose uses two complementary augmentations, one that manipulates the context while preserving the desired property and the other that manipulates the desired property while preserving the context. The augmented variants of the data are encoded by two encoders and reconstructed by a decoder. We prove that one of the encoders embeds the desired property while the other embeds the context property. We apply Autodecompose to audio signals to encode sound source (human voice) and content. We pre-trained the model on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rezabonyadi/autodecompose
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing