Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction
Hagen Soltau, Mingqiu Wang, Izhak Shafran, Laurent El Shafey

TL;DR
This paper introduces a transformer-based RNN-T model for extracting rich, clinically relevant information from long-form medical conversations, including transcription, speaker roles, punctuation, and confidence scoring, with promising accuracy for practical use.
Contribution
The paper presents a novel transformer-based RNN-T model tailored for long-form medical audio, integrating transcription, speaker diarization, punctuation, confidence scoring, and information extraction.
Findings
Achieved about 20% WER on ASR task
Attained 6% WDER on diarization
F-scores up to 0.90 for medication extraction
Abstract
In this paper, we describe novel components for extracting clinically relevant information from medical conversations which will be available as Google APIs. We describe a transformer-based Recurrent Neural Network Transducer (RNN-T) model tailored for long-form audio, which can produce rich transcriptions including speaker segmentation, speaker role labeling, punctuation and capitalization. On a representative test set, we compare performance of RNN-T models with different encoders, units and streaming constraints. Our transformer-based streaming model performs at about 20% WER on the ASR task, 6% WDER on the diarization task, 43% SER on periods, 52% SER on commas, 43% SER on question marks and 30% SER on capitalization. Our recognizer is paired with a confidence model that utilizes both acoustic and lexical features from the recognizer. The model performs at about 0.37 NCE. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · linguistics and terminology studies · Topic Modeling
