DELULU: Discriminative Embedding Learning Using Latent Units for Speaker-Aware Self-Trained Speech Foundational Model
Massa Baali, Rita Singh, Bhiksha Raj

TL;DR
DELULU is a novel self-supervised speech model that incorporates speaker information into training, significantly improving speaker verification and profiling tasks without requiring task-specific fine-tuning.
Contribution
It introduces speaker-aware pseudo-labeling using a speaker verification model to guide clustering, enhancing speaker-discriminative features in self-supervised speech representations.
Findings
Up to 62% relative improvement in speaker verification EER
Consistent gains in zero-shot profiling tasks
Surpasses teacher model on zero-shot evaluations
Abstract
Self-supervised speech models have achieved remarkable success on content-driven tasks, yet they remain limited in capturing speaker-discriminative features critical for verification, diarization, and profiling applications. We introduce \textsc{DELULU}, a speaker-aware self-trained foundational model that addresses this limitation by incorporating speaker-informed structure into pseudo-label generation. DELULU leverages frame-level embeddings from ReDimNet, a state-of-the-art speaker verification model, to guide k-means clustering during pre-training, introducing a speaker-discriminative inductive bias that aligns representation learning with speaker identity. DELULU significantly outperforms prior SSL models across a range of speaker-centric tasks, achieving up to \textbf{62\% relative improvement} in equal error rate (EER) for speaker verification and consistent gains on zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Authorship Attribution and Profiling
