Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
Nikolaos Flemotomos, Roger Hsiao, Pawel Swietojanski, Takaaki Hori,, Dogan Can, Xiaodan Zhuang

TL;DR
This paper introduces a vector quantization-based approximation for cross-attention in neural speech recognition, enabling efficient use of large biasing catalogues and significantly improving accuracy and computational efficiency.
Contribution
It proposes a novel approximation method for cross-attention using vector quantization, allowing large-scale biasing catalogues to be used efficiently in speech recognition.
Findings
Up to 71% relative error rate reduction in personal entity recognition.
20% reduction in compute time for large biasing lists.
85-95% reduction in memory usage with the proposed method.
Abstract
Neural contextual biasing allows speech recognition models to leverage contextually relevant information, leading to improved transcription accuracy. However, the biasing mechanism is typically based on a cross-attention module between the audio and a catalogue of biasing entries, which means computational complexity can pose severe practical limitations on the size of the biasing catalogue and consequently on accuracy improvements. This work proposes an approximation to cross-attention scoring based on vector quantization and enables compute- and memory-efficient use of large biasing catalogues. We propose to use this technique jointly with a retrieval based contextual biasing approach. First, we use an efficient quantized retrieval module to shortlist biasing entries by grounding them on audio. Then we use retrieved entries for biasing. Since the proposed approach is agnostic to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsConcatenated Skip Connection · Softmax
