Disentangled Representation Learning for Environment-agnostic Speaker   Recognition

KiHyun Nam; Hee-Soo Heo; Jee-weon Jung; Joon Son Chung

arXiv:2406.14559·cs.SD·June 21, 2024

Disentangled Representation Learning for Environment-agnostic Speaker Recognition

KiHyun Nam, Hee-Soo Heo, Jee-weon Jung, Joon Son Chung

PDF

Open Access 1 Repo

TL;DR

This paper introduces a feature disentanglement framework using auto-encoders to produce speaker embeddings that are invariant to environmental noise, improving recognition accuracy across benchmarks.

Contribution

It proposes a versatile auto-encoder based disentanglement method that enhances existing speaker embedding extractors without structural changes.

Findings

01

Up to 16% performance improvement on benchmarks

02

Compatible with any existing speaker embedding extractor

03

Effective in isolating speaker characteristics from environmental factors

Abstract

This work presents a framework based on feature disentanglement to learn speaker embeddings that are robust to environmental variations. Our framework utilises an auto-encoder as a disentangler, dividing the input speaker embedding into components related to the speaker and other residual information. We employ a group of objective functions to ensure that the auto-encoder's code representation - used as the refined embedding - condenses only the speaker characteristics. We show the versatility of our framework through its compatibility with any existing speaker embedding extractor, requiring no structural modifications or adaptations for integration. We validate the effectiveness of our framework by incorporating it into two popularly used embedding extractors and conducting experiments across various benchmarks. The results show a performance improvement of up to 16%. We release our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaistmm/voxceleb-disentangler
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing