Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications
Jean-Philippe Corbeil, Asma Ben Abacha, George Michalopoulos, Phillip Swazinna, Miguel Del-Agua, Jerome Tremblay, Akila Jeeson Daniel, Cari Bader, Yu-Cheng Cho, Pooja Krishnan, Nathan Bodenstab, Thomas Lin, Wenxuan Teng, Francois Beaulieu, Paul Vozila

TL;DR
This paper explores the use of large language models to improve clinical NLP tasks like structuring speech transcripts and extracting medical orders, addressing data challenges and proposing new datasets and pipelines.
Contribution
It introduces an agentic pipeline for realistic nurse dictations and releases the first open-source datasets for clinical observation and order extraction.
Findings
LLMs perform well on clinical NLP tasks but have limitations.
The proposed pipeline generates realistic, non-sensitive clinical data.
Open-source datasets facilitate further research in clinical NLP.
Abstract
Large language models (LLMs) such as GPT-4o and o1 have demonstrated strong performance on clinical natural language processing (NLP) tasks across multiple medical benchmarks. Nonetheless, two high-impact NLP tasks - structured tabular reporting from nurse dictations and medical order extraction from doctor-patient consultations - remain underexplored due to data scarcity and sensitivity, despite active industry efforts. Practical solutions to these real-world clinical tasks can significantly reduce the documentation burden on healthcare providers, allowing greater focus on patient care. In this paper, we investigate these two challenging tasks using private and open-source clinical datasets, evaluating the performance of both open- and closed-weight LLMs, and analyzing their respective strengths and limitations. Furthermore, we propose an agentic pipeline for generating realistic,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗microsoft/MediPhimodel· 4.2k dl· ♡ 194.2k dl♡ 19
- 🤗microsoft/MediPhi-PubMedmodel· 155 dl· ♡ 9155 dl♡ 9
- 🤗microsoft/MediPhi-MedWikimodel· 35 dl· ♡ 335 dl♡ 3
- 🤗microsoft/MediPhi-Instructmodel· 4.8k dl· ♡ 614.8k dl♡ 61
- 🤗microsoft/MediPhi-MedCodemodel· 74 dl· ♡ 674 dl♡ 6
- 🤗microsoft/MediPhi-Clinicalmodel· 418 dl· ♡ 12418 dl♡ 12
- 🤗microsoft/MediPhi-Guidelinesmodel· 34 dl· ♡ 434 dl♡ 4
- 🤗gabriellarson/MediPhi-Instruct-GGUFmodel· 34 dl· ♡ 234 dl♡ 2
- 🤗Mungert/MediPhi-Instruct-GGUFmodel· 250 dl250 dl
- 🤗prathamesh-chavan/MediPhi-MedCode-bnb-4bitmodel
Videos
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques
