Learning Preference-Based Objectives from Clinical Narratives for Sequential Treatment Decision-Making
Daniel J. Tan, Kay Choong See, Mengling Feng

TL;DR
This paper introduces a framework that learns reward functions from clinical narratives, improving reinforcement learning policies for healthcare by capturing treatment effectiveness and patient recovery dynamics.
Contribution
It proposes Clinical Narrative-informed Preference Rewards (CN-PR), leveraging large language models to derive trajectory quality scores from discharge summaries for reward learning.
Findings
Reward correlates strongly with trajectory quality (Spearman rho = 0.63).
Policies learned with narrative-based rewards improve recovery outcomes.
The approach maintains performance on mortality while enhancing other health metrics.
Abstract
Designing reward functions remains a central challenge in reinforcement learning (RL) for healthcare, where outcomes are sparse, delayed, and difficult to specify. While structured data capture physiological states, they often fail to reflect the overall quality of a patient's clinical trajectory, including recovery dynamics, treatment burden, and stability. Clinical narratives, in contrast, summarize longitudinal reasoning and implicitly encode evaluations of treatment effectiveness. We propose Clinical Narrative-informed Preference Rewards (CN-PR), a framework for learning reward functions directly from discharge summaries by treating them as scalable supervision for trajectory-level preferences. Using a large language model, we derive trajectory quality scores (TQS) and construct pairwise preferences over patient trajectories, enabling reward learning via a structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
