Understanding Medical Conversations: Rich Transcription, Confidence   Scores & Information Extraction

Hagen Soltau; Mingqiu Wang; Izhak Shafran; Laurent El Shafey

arXiv:2104.02219·cs.LG·April 7, 2021·1 cites

Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction

Hagen Soltau, Mingqiu Wang, Izhak Shafran, Laurent El Shafey

PDF

Open Access

TL;DR

This paper introduces a transformer-based RNN-T model for extracting rich, clinically relevant information from long-form medical conversations, including transcription, speaker roles, punctuation, and confidence scoring, with promising accuracy for practical use.

Contribution

The paper presents a novel transformer-based RNN-T model tailored for long-form medical audio, integrating transcription, speaker diarization, punctuation, confidence scoring, and information extraction.

Findings

01

Achieved about 20% WER on ASR task

02

Attained 6% WDER on diarization

03

F-scores up to 0.90 for medication extraction

Abstract

In this paper, we describe novel components for extracting clinically relevant information from medical conversations which will be available as Google APIs. We describe a transformer-based Recurrent Neural Network Transducer (RNN-T) model tailored for long-form audio, which can produce rich transcriptions including speaker segmentation, speaker role labeling, punctuation and capitalization. On a representative test set, we compare performance of RNN-T models with different encoders, units and streaming constraints. Our transformer-based streaming model performs at about 20% WER on the ASR task, 6% WDER on the diarization task, 43% SER on periods, 52% SER on commas, 43% SER on question marks and 30% SER on capitalization. Our recognizer is paired with a confidence model that utilizes both acoustic and lexical features from the recognizer. The model performs at about 0.37 NCE. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · linguistics and terminology studies · Topic Modeling