Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition
Leyuan Qu, Cornelius Weber, Stefan Wermter

TL;DR
This paper introduces a novel acoustic-level approach for recognizing out-of-vocabulary words in speech recognition by generating synthetic audio and rescaling training losses, improving recall rates without significantly increasing errors.
Contribution
It proposes a new method combining loss rescaling and regularization to enhance OOV word recognition and support continual learning in end-to-end ASR systems.
Findings
Loss rescaling improves OOV recall rates.
Word-level rescaling outperforms utterance-level rescaling.
Combined methods enable continual learning with minimal WER increase.
Abstract
Due to the dynamic nature of human language, automatic speech recognition (ASR) systems need to continuously acquire new vocabulary. Out-Of-Vocabulary (OOV) words, such as trending words and new named entities, pose problems to modern ASR systems that require long training times to adapt their large numbers of parameters. Different from most previous research focusing on language model post-processing, we tackle this problem on an earlier processing level and eliminate the bias in acoustic modeling to recognize OOV words acoustically. We propose to generate OOV words using text-to-speech systems and to rescale losses to encourage neural networks to pay more attention to OOV words. Specifically, we enlarge the classification loss used for training neural networks' parameters of utterances containing OOV words (sentence-level), or rescale the gradient used for back-propagation for OOV…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsElastic Weight Consolidation
