MIMIC-SR-ICD11: A Dataset for Narrative-Based Diagnosis

Yuexin Wu; Shiqi Wang; Vasile Rus

arXiv:2511.05485·cs.CL·November 10, 2025

MIMIC-SR-ICD11: A Dataset for Narrative-Based Diagnosis

Yuexin Wu, Shiqi Wang, Vasile Rus

PDF

Open Access

TL;DR

This paper introduces MIMIC-SR-ICD11, a large dataset of clinical narratives aligned with ICD-11, and proposes LL-Rank, a likelihood-based re-ranking method that improves diagnostic label prediction from clinical reports.

Contribution

The paper provides a new narrative-based diagnostic dataset aligned with ICD-11 and introduces LL-Rank, a novel likelihood-based re-ranking framework that enhances label prediction accuracy.

Findings

01

LL-Rank outperforms baseline methods across seven models.

02

Ablation shows LL-Rank's PMI scoring isolates semantic compatibility.

03

Dataset enables improved diagnosis from clinical narratives.

Abstract

Disease diagnosis is a central pillar of modern healthcare, enabling early detection and timely intervention for acute conditions while guiding lifestyle adjustments and medication regimens to prevent or slow chronic disease. Self-reports preserve clinically salient signals that templated electronic health record (EHR) documentation often attenuates or omits, especially subtle but consequential details. To operationalize this shift, we introduce MIMIC-SR-ICD11, a large English diagnostic dataset built from EHR discharge notes and natively aligned to WHO ICD-11 terminology. We further present LL-Rank, a likelihood-based re-ranking framework that computes a length-normalized joint likelihood of each label given the clinical report context and subtracts the corresponding report-free prior likelihood for that label. Across seven model backbones, LL-Rank consistently outperforms a strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Medical Coding and Health Information · Topic Modeling