Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition
Piyush Behre, Sharman Tan, Padma Varadharajan, Shuangyu Chang

TL;DR
This paper introduces a streaming punctuation method for continuous speech recognition that uses dynamic decoding windows to improve segmentation and punctuation accuracy without sacrificing real-time performance.
Contribution
It presents a novel streaming approach leveraging dynamic decoding windows and bidirectional context to enhance punctuation and segmentation in real-time ASR systems.
Findings
Improved segmentation F0.5-score by 13.9%.
Achieved an average BLEU score increase of 0.66 in machine translation.
Effectively reduces over-segmentation issues.
Abstract
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation and punctuation problems, resulting from irregular pausing patterns or slow speakers. Transformer sequence tagging models are effective at capturing long bi-directional context, which is crucial for automatic punctuation. Automatic Speech Recognition (ASR) production systems, however, are constrained by real-time requirements, making it hard to incorporate the right context when making punctuation decisions. Context within the segments produced by ASR decoders can be helpful but limiting in overall punctuation performance for a continuous speech session. In this paper, we propose a streaming approach for punctuation or re-punctuation of ASR output using dynamic decoding windows and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Linear Layer · Dropout · Softmax · Multi-Head Attention · Adam · Residual Connection · Label Smoothing
