MEDIC: Comprehensive Evaluation of Leading Indicators for LLM Safety and Utility in Clinical Applications
Praveenkumar Kanithi, Cl\'ement Christophe, Marco AF Pimentel, Tathagata Raha, Prateek Munjal, Nada Saadi, Hamza A Javed, Svetlana Maslenkova, Nasir Hayat, Ronnie Rajan, Shadab Khan

TL;DR
MEDIC introduces a comprehensive framework for evaluating LLMs in clinical settings, revealing critical performance gaps and safety trade-offs that inform more effective deployment strategies.
Contribution
We develop MEDIC, a novel evaluation framework with new protocols and benchmarks for assessing LLM safety and utility in clinical applications.
Findings
Knowledge-execution gap identified in LLM performance
Divergence between passive and active safety measures
No single model architecture excels across all clinical tasks
Abstract
While Large Language Models (LLMs) achieve superhuman performance on standardized medical licensing exams, these static benchmarks have become saturated and increasingly disconnected from the functional requirements of clinical workflows. To bridge the gap between theoretical capability and verified utility, we introduce MEDIC, a comprehensive evaluation framework establishing leading indicators across various clinical dimensions. Beyond standard question-answering, we assess operational capabilities using deterministic execution protocols and a novel Cross-Examination Framework (CEF), which quantifies information fidelity and hallucination rates without reliance on reference texts. Our evaluation across a heterogeneous task suite exposes critical performance trade-offs: we identify a significant knowledge-execution gap, where proficiency in static retrieval does not predict success in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Biomedical Text Mining and Ontologies · Research Data Management Practices
