L-Vector: Neural Label Embedding for Domain Adaptation
Zhong Meng, Hu Hu, Jinyu Li, Changliang Liu, Yan Huang, Yifan Gong,, Chin-Hui Lee

TL;DR
This paper introduces a neural label embedding method for domain adaptation in acoustic models, effectively transferring knowledge from source to target domains without requiring paired data, resulting in significant WER improvements.
Contribution
It presents a novel label embedding scheme that distills source model knowledge into label vectors, enabling effective unsupervised domain adaptation for speech recognition.
Findings
Achieved up to 14.1% relative WER reduction
Effective without paired target-source data
Applicable to large-scale multi-conditional models
Abstract
We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains. With NLE method, we distill the knowledge from a powerful source-domain DNN into a dictionary of label embeddings, or l-vectors, one for each senone class. Each l-vector is a representation of the senone-specific output distributions of the source-domain DNN and is learned to minimize the average L2, Kullback-Leibler (KL) or symmetric KL distance to the output vectors with the same label through simple averaging or standard back-propagation. During adaptation, the l-vectors serve as the soft targets to train the target-domain model with cross-entropy loss. Without parallel data constraint as in the teacher-student learning, NLE is specially suited for the situation where the paired target-domain data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
