Evaluating ChatGPT on Medical Information Extraction Tasks: Performance, Explainability and Beyond

Liz Li; Wei Zhu

arXiv:2601.21767·cs.CL·February 12, 2026

Evaluating ChatGPT on Medical Information Extraction Tasks: Performance, Explainability and Beyond

Liz Li, Wei Zhu

PDF

Open Access

TL;DR

This study evaluates ChatGPT's performance, explainability, confidence, and faithfulness in medical information extraction tasks, revealing strengths in explainability but limitations in accuracy and over-confidence compared to specialized models.

Contribution

It provides a comprehensive analysis of ChatGPT's capabilities and limitations in medical information extraction, highlighting areas for improvement and potential challenges in clinical applications.

Findings

01

ChatGPT underperforms compared to fine-tuned models in MedIE tasks.

02

ChatGPT offers high-quality explanations but is often over-confident.

03

Uncertainty in generation affects extraction reliability.

Abstract

Large Language Models (LLMs) like ChatGPT have demonstrated amazing capabilities in comprehending user intents and generate reasonable and useful responses. Beside their ability to chat, their capabilities in various natural language processing (NLP) tasks are of interest to the research community. In this paper, we focus on assessing the overall ability of ChatGPT in 4 different medical information extraction (MedIE) tasks across 6 benchmark datasets. We present the systematically analysis by measuring ChatGPT's performance, explainability, confidence, faithfulness, and uncertainty. Our experiments reveal that: (a) ChatGPT's performance scores on MedIE tasks fall behind those of the fine-tuned baseline models. (b) ChatGPT can provide high-quality explanations for its decisions, however, ChatGPT is over-confident in its predcitions. (c) ChatGPT demonstrates a high level of faithfulness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Explainable Artificial Intelligence (XAI)