Symphony for Speech-to-Text: Supporting Real-Time Medical Voice Interfaces
Arne Nix, Robert James, Lasse Borgholt, Anna B. Ekner, Lana Krumm, Julius Severin, Dan Engel, Lars Maal{\o}e, Jakob Havtorn

TL;DR
Symphony for Speech-to-Text is a new medical-grade speech recognition system that improves real-time and batch clinical transcription by specialized components, outperforming existing systems in medical and general domains.
Contribution
It introduces a decomposed recognition system that enhances medical term recall and produces structured clinical text, with robust performance across diverse settings.
Findings
Outperforms state-of-the-art systems in clinical speech recognition
Matches or exceeds performance in general-domain speech tasks
Provides a new clinical benchmark dataset for validation
Abstract
After decades of use in dictation and, more recently, ambient documentation, speech is emerging as a primary modality for interacting with technology and AI in healthcare. Yet medical speech recognition remains difficult: systems must capture specialized terminology, resolve contextual ambiguity, and render measurements, abbreviations, and clinical shorthand precisely. Existing solutions are typically optimized either for general-purpose transcription or narrow dictation workflows, limiting their reliability in safety-critical settings and their usefulness for broader clinical workflows. We introduce Symphony for Speech-to-Text, a medical-grade speech recognition system for real-time streaming and batch file-based clinical use. Symphony decomposes the transcription process into specialized components for recognition, formatting, and contextual correction to optimize medical term recall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
