Conformal Prediction for Risk-Controlled Medical Entity Extraction Across Clinical Domains
Manil Shrestha, Edward Kim

TL;DR
This paper introduces a conformal prediction framework that provides reliable confidence guarantees for medical entity extraction using large language models across different clinical domains, addressing calibration issues.
Contribution
It demonstrates domain-specific conformal calibration for LLM-based medical extraction, ensuring safe deployment with coverage guarantees despite calibration heterogeneity.
Findings
Models are underconfident on FDA labels, overconfident on radiology reports.
Conformal prediction achieves ≥90% coverage with low rejection rates.
Calibration depends on document structure, category, and model architecture.
Abstract
Large Language Models (LLMs) are increasingly used for medical entity extraction, yet their confidence scores are often miscalibrated, limiting safe deployment in clinical settings. We present a conformal prediction framework that provides finite-sample coverage guarantees for LLM-based extraction across two clinical domains. First, we extract structured entities from 1,000 FDA drug labels across eight sections using GPT-4.1, verified via FactScore-based atomic statement evaluation (97.7\% accuracy over 128,906 entities). Second, we extract radiological entities from MIMIC-CXR reports using the RadGraph schema with GPT-4.1 and Llama-4-Maverick, evaluated against physician annotations (entity F1: 0.81 to 0.84). Our central finding is that miscalibration direction reverses across domains: on well-structured FDA labels, models are underconfident, requiring modest conformal thresholds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Biomedical Text Mining and Ontologies
