Contextual Biasing for Streaming ASR via CTC-based Word Spotting
Kai-Chen Tsai, Tien-Hong Lo, Yun-Ting Sun, and Berlin Chen

TL;DR
This paper introduces a novel streaming extension of CTC-based word spotting for real-time contextual biasing in ASR, enabling detection of domain-specific words with low latency without retraining models.
Contribution
It proposes a stateful token passing algorithm and incremental commitment mechanism to adapt offline CTC-WS for streaming ASR, improving real-time keyword detection and recognition accuracy.
Findings
Reduces overall WER in streaming ASR
Improves keyword F-score for domain-specific words
Maintains low latency with incremental emission strategy
Abstract
Contextual biasing is essential to improving the recognition of rare and domain-specific words in an automatic speech recognition (ASR) system. While numerous methods have been proposed in recent years, most of them focus on offline settings and do not explicitly address the challenges of streaming ASR. For example, CTC-based word spotting (CTC-WS) have demonstrated strong performance by directly detecting keywords from CTC log-probabilities, but they are limited to offline processing and require access to the full utterance. In This work, we present a streaming extension of CTC-WS for real-time contextual biasing. Our method maintains active keyword paths across audio chunks using a stateful token passing algorithm, enabling the detection of keywords that span multiple chunks. To ensure low latency and stable output, we introduce an incremental commitment mechanism that only emits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
