Towards Robust and Fair Next Visit Diagnosis Prediction under Noisy Clinical Notes with Large Language Models
Heejoon Koo

TL;DR
This paper investigates how large language models perform in clinical diagnosis prediction when faced with noisy or degraded clinical notes, proposing methods to improve robustness and fairness in such scenarios.
Contribution
It introduces a label-reduction scheme and a hierarchical reasoning strategy to enhance LLM robustness and fairness in noisy clinical text analysis.
Findings
Improved robustness of LLMs under text corruption scenarios.
Reduced subgroup instability in diagnosis predictions.
Demonstrated effectiveness of hierarchical reasoning in clinical tasks.
Abstract
A decade of rapid advances in artificial intelligence (AI) has opened new opportunities for clinical decision support systems (CDSS), with large language models (LLMs) demonstrating strong reasoning abilities on timely medical tasks. However, clinical texts are often degraded by human errors or failures in automated pipelines, raising concerns about the reliability and fairness of AI-assisted decision-making. Yet the impact of such degradations remains under-investigated, particularly regarding how noise-induced shifts can heighten predictive uncertainty and unevenly affect demographic subgroups. We present a systematic study of state-of-the-art LLMs under diverse text corruption scenarios, focusing on robustness and equity in next-visit diagnosis prediction. To address the challenge posed by the large diagnostic label space, we introduce a clinically grounded label-reduction scheme and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
