TL;DR
This study investigates how open-source large language models learn clinical information by analyzing their understanding of clinical jargon and responses to unsupported medical claims, revealing data mismatches and source influences.
Contribution
It introduces a new dataset MedLingo and provides insights into the relationship between pretraining data and clinical language understanding in LLMs.
Findings
Frequency of clinical jargon correlates with model performance.
Clinical jargon often underrepresented in pretraining corpora.
Models can parrot unsupported medical claims from online sources.
Abstract
Large language models (LLMs) have performed well across various clinical natural language processing tasks, despite not being directly trained on electronic health record (EHR) data. In this work, we examine how popular open-source LLMs learn clinical information from large mined corpora through two crucial but understudied lenses: (1) their interpretation of clinical jargon, a foundational ability for understanding real-world clinical notes, and (2) their responses to unsupported medical claims. For both use cases, we investigate the frequency of relevant clinical information in their corresponding pretraining corpora, the relationship between pretraining data composition and model outputs, and the sources underlying this data. To isolate clinical jargon understanding, we evaluate LLMs on a new dataset MedLingo. Unsurprisingly, we find that the frequency of clinical jargon mentions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
