Triage knowledge distillation for speaker verification

Ju-ho Kim; Youngmoon Jung; Joon-Young Yang; Jaeyoung Roh; Chang Woo Han; Hoon-Young Cho

arXiv:2601.14699·eess.AS·January 22, 2026

Triage knowledge distillation for speaker verification

Ju-ho Kim, Youngmoon Jung, Joon-Young Yang, Jaeyoung Roh, Chang Woo Han, Hoon-Young Cho

PDF

Open Access

TL;DR

This paper introduces Triage Knowledge Distillation (TRKD), a novel method for speaker verification that improves model performance by selectively transferring relational information based on example difficulty, especially effective on resource-limited devices.

Contribution

TRKD operationalizes assess-prioritize-focus in knowledge distillation, using a cumulative-probability cutoff to focus on confusable classes and progressively refine learning, outperforming existing KD methods.

Findings

01

TRKD achieves the lowest EER across all protocols on VoxCeleb1.

02

TRKD outperforms recent KD variants in speaker verification tasks.

03

TRKD effectively handles large-class settings with long-tail distributions.

Abstract

Deploying speaker verification on resource-constrained devices remains challenging due to the computational cost of high-capacity models; knowledge distillation (KD) offers a remedy. Classical KD entangles target confidence with non-target structure in a Kullback-Leibler term, limiting the transfer of relational information. Decoupled KD separates these signals into target and non-target terms, yet treats non-targets uniformly and remains vulnerable to the long tail of low-probability classes in large-class settings. We introduce Triage KD (TRKD), a distillation scheme that operationalizes assess-prioritize-focus. TRKD introduces a cumulative-probability cutoff $τ$ to assess per-example difficulty and partition the teacher posterior into three groups: the target class, a high-probability non-target confusion-set, and a background-set. To prioritize informative signals, TRKD distills…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Machine Learning and Algorithms