Scalable Offline ASR for Command-Style Dictation in Courtrooms
Kumarmanas Nethil, Vaibhav Mishra, Kriti Anandan, Kavya Manohar

TL;DR
This paper introduces an open-source, scalable offline ASR framework for command-style dictation in courtrooms, utilizing VAD and parallel transcriptions to reduce latency and improve efficiency in real-world legal settings.
Contribution
It presents a versatile multiplexing approach compatible with various ASR architectures, significantly enhancing resource utilization and latency performance in courtroom dictation systems.
Findings
Deployment in 15% of Indian courtrooms demonstrates practical impact.
Latency decreases as user concurrency increases, outperforming sequential batch methods.
Framework is open-source and adaptable to multiple ASR models.
Abstract
We propose an open-source framework for Command-style dictation that addresses the gap between resource-intensive Online systems and high-latency Batch processing. Our approach uses Voice Activity Detection (VAD) to segment audio and transcribes these segments in parallel using Whisper models, enabling efficient multiplexing across audios. Unlike proprietary systems like SuperWhisper, this framework is also compatible with most ASR architectures, including widely used CTC-based models. Our multiplexing technique maximizes compute utilization in real-world settings, as demonstrated by its deployment in around 15% of India's courtrooms. Evaluations on live data show consistent latency reduction as user concurrency increases, compared to sequential batch processing. The live demonstration will showcase our open-sourced implementation and allow attendees to interact with it in real-time.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
