Development and multi-center evaluation of domain-adapted speech recognition for human-AI teaming in real-world gastrointestinal endoscopy
Ruijie Yang, Yan Zhu, Peiyao Fu, Te Luo, Zhihua Wang, Xian Yang, Quanlin Li, Pinghong Zhou, Shuo Wang

TL;DR
This paper introduces EndoASR, a domain-adapted speech recognition system optimized for real-time use in gastrointestinal endoscopy, demonstrating significant accuracy improvements and robustness across multiple clinical centers.
Contribution
The study develops a novel two-stage adaptation strategy for ASR tailored to endoscopy, achieving high accuracy, speed, and robustness suitable for clinical deployment.
Findings
CER reduced from 20.52% to 14.14% in retrospective evaluation.
Med ACC increased from 54.30% to 87.59% in retrospective evaluation.
Achieves real-time processing with RTF of 0.005, faster than Whisper-large-v3.
Abstract
Automatic speech recognition (ASR) is a critical interface for human-AI interaction in gastrointestinal endoscopy, yet its reliability in real-world clinical settings is limited by domain-specific terminology and complex acoustic conditions. Here, we present EndoASR, a domain-adapted ASR system designed for real-time deployment in endoscopic workflows. We develop a two-stage adaptation strategy based on synthetic endoscopy reports, targeting domain-specific language modeling and noise robustness. In retrospective evaluation across six endoscopists, EndoASR substantially improves both transcription accuracy and clinical usability, reducing character error rate (CER) from 20.52% to 14.14% and increasing medical term accuracy (Med ACC) from 54.30% to 87.59%. In a prospective multi-center study spanning five independent endoscopy centers, EndoASR demonstrates consistent generalization under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
