Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
Rehan Ahmad, Muhammad Umar Farooq, Qihang Feng, Thomas Hain

TL;DR
This paper introduces a joint update strategy for teacher-student speech recognition models that enhances unsupervised domain adaptation, reducing word error rates more effectively than previous multi-stage methods.
Contribution
It proposes a simultaneous ensemble and student model update approach that eliminates the need for sequential training, improving adaptation efficiency.
Findings
Achieved a 4.6% WER reduction on Switchboard eval00 test set.
Outperformed existing multi-stage and iterative training methods.
Validated on datasets AMI, WSJ, LS360, and SwitchBoard.
Abstract
Speech recognition systems often struggle with data domains that have not been included in the training. To address this, unsupervised domain adaptation has been explored with ensemble and multi-stage teacher-student training methods reducing the word error rate. Despite improvements, the error rate remains much higher than that achieved with supervised in-domain training. This work proposes a more efficient strategy by simultaneously updating the ensemble of teacher models along with the single student model eliminating the need for sequential models training. The joint update improves the word error rate of the student model, benefiting the progressively enhanced teacher models. Experiments are conducted with three labelled source datasets, namely AMI, WSJ, LS360, and one unlabeled target domain i.e. SwitchBoard. The results show that the proposed method improves the WER by 4.6% on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
