SEED: Speaker Embedding Enhancement Diffusion Model

KiHyun Nam; Jungwoo Heo; Jee-weon Jung; Gangin Park; Chaeyoung Jung; Ha-Jin Yu; Joon Son Chung

arXiv:2505.16798·eess.AS·May 23, 2025

SEED: Speaker Embedding Enhancement Diffusion Model

KiHyun Nam, Jungwoo Heo, Jee-weon Jung, Gangin Park, Chaeyoung Jung, Ha-Jin Yu, Joon Son Chung

PDF

Open Access 1 Repo

TL;DR

This paper introduces SEED, a diffusion-based approach that refines speaker embeddings to improve recognition accuracy under environmental mismatch without altering existing systems.

Contribution

SEED is a novel diffusion model that enhances speaker embeddings for robust recognition without requiring speaker labels or pipeline modifications.

Findings

01

Improves recognition accuracy by up to 19.6% under environmental mismatch.

02

Retains performance on conventional scenarios.

03

Does not require speaker labels or pipeline changes.

Abstract

A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch. We propose a diffusion-based method that takes speaker embeddings extracted from a pre-trained speaker recognition model and generates refined embeddings. For training, our approach progressively adds Gaussian noise to both clean and noisy speaker embeddings extracted from clean and noisy speech, respectively, via forward process of a diffusion model, and then reconstructs them to clean embeddings in the reverse process. While inferencing, all embeddings are regenerated via diffusion process. Our method needs neither speaker label nor any modification to the existing speaker recognition pipeline. Experiments on evaluation sets simulating environment mismatch scenarios show that our method can improve recognition accuracy by up to 19.6%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaistmm/seed-pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Face recognition and analysis

MethodsDiffusion