Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models

Jin Sob Kim; Hyun Joon Park; Wooseok Shin; Juan Yun; Sung Won Han

arXiv:2409.07770·eess.AS·December 16, 2025

Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models

Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Juan Yun, Sung Won Han

PDF

Open Access 1 Repo

TL;DR

This paper introduces L-TDNN, a novel layer-aware neural network that leverages multi-layer features from pre-trained speech models to improve speaker verification accuracy, efficiency, and model compactness.

Contribution

The paper proposes a layer-aware processing method for multi-layer SSL encoder outputs, enhancing speaker verification performance and model efficiency.

Findings

01

L-TDNN achieves the lowest error rates across multiple datasets.

02

L-TDNN demonstrates robustness and efficiency comparable to existing systems.

03

The approach improves utilization of multi-layer features from pre-trained models.

Abstract

Recent advances in self-supervised learning (SSL) on Transformers have significantly improved speaker verification (SV) by providing domain-general speech representations. However, existing approaches have underutilized the multi-layered nature of SSL encoders. To address this limitation, we propose the layer-aware time-delay neural network (L-TDNN), which directly performs layer/frame-wise processing on the layer-wise hidden state outputs from pre-trained models, extracting fixed-size speaker vectors. L-TDNN comprises a layer-aware convolutional network, a frame-adaptive layer aggregation, and attentive statistic pooling, explicitly modeling of the recognition and processing of previously overlooked layer dimension. We evaluated L-TDNN across multiple speech SSL Transformers and diverse speech-speaker corpora against other approaches for leveraging pre-trained encoders. L-TDNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sadpororo/unipool-sv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques