Unsupervised Model-based speaker adaptation of end-to-end lattice-free   MMI model for speech recognition

Xurong Xie; Xunying Liu; Hui Chen; Hongan Wang

arXiv:2211.09313·eess.AS·January 9, 2023

Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition

Xurong Xie, Xunying Liu, Hui Chen, Hongan Wang

PDF

Open Access

TL;DR

This paper introduces an unsupervised speaker adaptation method for end-to-end lattice-free MMI speech recognition models using LHUC and BLHUC techniques, significantly reducing word error rates on the Switchboard dataset.

Contribution

It proposes a novel unsupervised model-based adaptation framework for E2E LF-MMI models employing LHUC/BLHUC, with systematic regularization and confidence-based data selection.

Findings

01

BLHUC adaptation reduces WER by up to 14.7% relative.

02

The proposed method achieves WERs comparable to state-of-the-art hybrid and Conformer systems.

03

Confidence score-based data selection improves adaptation effectiveness.

Abstract

Modeling the speaker variability is a key challenge for automatic speech recognition (ASR) systems. In this paper, the learning hidden unit contributions (LHUC) based adaptation techniques with compact speaker dependent (SD) parameters are used to facilitate both speaker adaptive training (SAT) and unsupervised test-time speaker adaptation for end-to-end (E2E) lattice-free MMI (LF-MMI) models. An unsupervised model-based adaptation framework is proposed to estimate the SD parameters in E2E paradigm using LF-MMI and cross entropy (CE) criterions. Various regularization methods of the standard LHUC adaptation, e.g., the Bayesian LHUC (BLHUC) adaptation, are systematically investigated to mitigate the risk of overfitting, on E2E LF-MMI CNN-TDNN and CNN-TDNN-BLSTM models. Lattice-based confidence score estimation is used for adaptation data selection to reduce the supervision label…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing