Spell my name: keyword boosted speech recognition
Namkyu Jung, Geonmin Kim, Joon Son Chung

TL;DR
This paper introduces a simple, training-free decoding method that enhances recognition of uncommon words like names and technical terms in speech recognition systems, improving accuracy without sacrificing overall performance.
Contribution
The paper presents a keyword boosting technique for ASR decoding that improves recognition of rare words without requiring additional training.
Findings
Significant boost in keyword accuracy on LibriSpeech and real-world data.
Maintains overall word recognition accuracy.
Applicable to other tasks like machine translation.
Abstract
Recognition of uncommon words such as names and technical terminology is important to understanding conversations in context. However, the ability to recognise such words remains a challenge in modern automatic speech recognition (ASR) systems. In this paper, we propose a simple but powerful ASR decoding method that can better recognise these uncommon keywords, which in turn enables better readability of the results. The method boosts the probabilities of given keywords in a beam search based on acoustic model predictions. The method does not require any training in advance. We demonstrate the effectiveness of our method on the LibriSpeeech test sets and also internal data of real-world conversations. Our method significantly boosts keyword accuracy on the test sets, while maintaining the accuracy of the other words, and as well as providing significant qualitative improvements.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Music and Audio Processing
MethodsTest
