Layer-aware TDNN: Speaker Recognition Using Multi-Layer Features from Pre-Trained Models
Jin Sob Kim, Hyun Joon Park, Wooseok Shin, Juan Yun, Sung Won Han

TL;DR
This paper introduces L-TDNN, a novel layer-aware neural network that leverages multi-layer features from pre-trained speech models to improve speaker verification accuracy, efficiency, and model compactness.
Contribution
The paper proposes a layer-aware processing method for multi-layer SSL encoder outputs, enhancing speaker verification performance and model efficiency.
Findings
L-TDNN achieves the lowest error rates across multiple datasets.
L-TDNN demonstrates robustness and efficiency comparable to existing systems.
The approach improves utilization of multi-layer features from pre-trained models.
Abstract
Recent advances in self-supervised learning (SSL) on Transformers have significantly improved speaker verification (SV) by providing domain-general speech representations. However, existing approaches have underutilized the multi-layered nature of SSL encoders. To address this limitation, we propose the layer-aware time-delay neural network (L-TDNN), which directly performs layer/frame-wise processing on the layer-wise hidden state outputs from pre-trained models, extracting fixed-size speaker vectors. L-TDNN comprises a layer-aware convolutional network, a frame-adaptive layer aggregation, and attentive statistic pooling, explicitly modeling of the recognition and processing of previously overlooked layer dimension. We evaluated L-TDNN across multiple speech SSL Transformers and diverse speech-speaker corpora against other approaches for leveraging pre-trained encoders. L-TDNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
