Semi-Supervised Speech Recognition via Local Prior Matching
Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Hannun

TL;DR
This paper introduces local prior matching (LPM), a semi-supervised learning method that leverages strong prior models like language models to improve speech recognition accuracy using unlabeled data.
Contribution
The paper proposes a novel semi-supervised objective, LPM, which distills knowledge from structured priors to enhance speech recognition models trained on limited labeled data.
Findings
LPM outperforms existing knowledge distillation methods.
LPM recovers 54% and 73% of WER reduction on clean and noisy data.
LPM effectively utilizes unlabeled speech data to improve recognition accuracy.
Abstract
For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically well-motivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
MethodsTest · Knowledge Distillation · Local Prior Matching
