Robust Prediction of Punctuation and Truecasing for Medical ASR
Monica Sunkara, Srikanth Ronanki, Kalpit Dixit, Sravan Bodapati,, Katrin Kirchhoff

TL;DR
This paper introduces a joint modeling approach using pretrained language models for punctuation and truecasing in medical ASR, enhancing accuracy and robustness through domain adaptation and data augmentation.
Contribution
It presents a novel framework leveraging pretrained masked language models for joint punctuation and truecasing prediction in medical speech recognition, with domain-specific fine-tuning and robustness improvements.
Findings
Achieved approximately 5% absolute F1 improvement on ground truth text.
Achieved approximately 10% absolute F1 improvement on ASR outputs.
Demonstrated effectiveness on medical dictation and conversational corpora.
Abstract
Automatic speech recognition (ASR) systems in the medical domain that focus on transcribing clinical dictations and doctor-patient conversations often pose many challenges due to the complexity of the domain. ASR output typically undergoes automatic punctuation to enable users to speak naturally, without having to vocalise awkward and explicit punctuation commands, such as "period", "add comma" or "exclamation point", while truecasing enhances user readability and improves the performance of downstream NLP tasks. This paper proposes a conditional joint modeling framework for prediction of punctuation and truecasing using pretrained masked language models such as BERT, BioBERT and RoBERTa. We also present techniques for domain and task specific adaptation by fine-tuning masked language models with medical domain data. Finally, we improve the robustness of the model against common errors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dropout · Multi-Head Attention · Residual Connection · Attention Is All You Need · RoBERTa · WordPiece · Layer Normalization · Attention Dropout · Weight Decay
