Development and multi-center evaluation of domain-adapted speech recognition for human-AI teaming in real-world gastrointestinal endoscopy

Ruijie Yang; Yan Zhu; Peiyao Fu; Te Luo; Zhihua Wang; Xian Yang; Quanlin Li; Pinghong Zhou; Shuo Wang

arXiv:2604.01705·cs.CL·April 3, 2026

Development and multi-center evaluation of domain-adapted speech recognition for human-AI teaming in real-world gastrointestinal endoscopy

Ruijie Yang, Yan Zhu, Peiyao Fu, Te Luo, Zhihua Wang, Xian Yang, Quanlin Li, Pinghong Zhou, Shuo Wang

PDF

TL;DR

This paper introduces EndoASR, a domain-adapted speech recognition system optimized for real-time use in gastrointestinal endoscopy, demonstrating significant accuracy improvements and robustness across multiple clinical centers.

Contribution

The study develops a novel two-stage adaptation strategy for ASR tailored to endoscopy, achieving high accuracy, speed, and robustness suitable for clinical deployment.

Findings

01

CER reduced from 20.52% to 14.14% in retrospective evaluation.

02

Med ACC increased from 54.30% to 87.59% in retrospective evaluation.

03

Achieves real-time processing with RTF of 0.005, faster than Whisper-large-v3.

Abstract

Automatic speech recognition (ASR) is a critical interface for human-AI interaction in gastrointestinal endoscopy, yet its reliability in real-world clinical settings is limited by domain-specific terminology and complex acoustic conditions. Here, we present EndoASR, a domain-adapted ASR system designed for real-time deployment in endoscopic workflows. We develop a two-stage adaptation strategy based on synthetic endoscopy reports, targeting domain-specific language modeling and noise robustness. In retrospective evaluation across six endoscopists, EndoASR substantially improves both transcription accuracy and clinical usability, reducing character error rate (CER) from 20.52% to 14.14% and increasing medical term accuracy (Med ACC) from 54.30% to 87.59%. In a prospective multi-center study spanning five independent endoscopy centers, EndoASR demonstrates consistent generalization under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.