Scalable Offline ASR for Command-Style Dictation in Courtrooms

Kumarmanas Nethil; Vaibhav Mishra; Kriti Anandan; Kavya Manohar

arXiv:2507.01021·eess.AS·September 16, 2025

Scalable Offline ASR for Command-Style Dictation in Courtrooms

Kumarmanas Nethil, Vaibhav Mishra, Kriti Anandan, Kavya Manohar

PDF

Open Access

TL;DR

This paper introduces an open-source, scalable offline ASR framework for command-style dictation in courtrooms, utilizing VAD and parallel transcriptions to reduce latency and improve efficiency in real-world legal settings.

Contribution

It presents a versatile multiplexing approach compatible with various ASR architectures, significantly enhancing resource utilization and latency performance in courtroom dictation systems.

Findings

01

Deployment in 15% of Indian courtrooms demonstrates practical impact.

02

Latency decreases as user concurrency increases, outperforming sequential batch methods.

03

Framework is open-source and adaptable to multiple ASR models.

Abstract

We propose an open-source framework for Command-style dictation that addresses the gap between resource-intensive Online systems and high-latency Batch processing. Our approach uses Voice Activity Detection (VAD) to segment audio and transcribes these segments in parallel using Whisper models, enabling efficient multiplexing across audios. Unlike proprietary systems like SuperWhisper, this framework is also compatible with most ASR architectures, including widely used CTC-based models. Our multiplexing technique maximizes compute utilization in real-world settings, as demonstrated by its deployment in around 15% of India's courtrooms. Evaluations on live data show consistent latency reduction as user concurrency increases, compared to sequential batch processing. The live demonstration will showcase our open-sourced implementation and allow attendees to interact with it in real-time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing