Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care

Aryaman Bansal; Divya Sharma

arXiv:2512.08012·cs.LG·December 10, 2025

Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care

Aryaman Bansal, Divya Sharma

PDF

Open Access

TL;DR

This paper benchmarks offline multi-objective reinforcement learning algorithms in critical care, demonstrating that sequence modeling architectures like PEDA DT outperform scalarized baselines in flexibility and decision-making adaptability.

Contribution

It introduces a comprehensive benchmarking of offline MORL algorithms in healthcare, highlighting the effectiveness of sequence modeling architectures for multi-objective decision-making.

Findings

01

PEDA DT outperforms scalarized baselines in flexibility

02

Sequence modeling architectures are effective in multi-objective healthcare tasks

03

Offline MORL enables personalized decision-making without retraining

Abstract

In critical care settings such as the Intensive Care Unit, clinicians face the complex challenge of balancing conflicting objectives, primarily maximizing patient survival while minimizing resource utilization (e.g., length of stay). Single-objective Reinforcement Learning approaches typically address this by optimizing a fixed scalarized reward function, resulting in rigid policies that fail to adapt to varying clinical priorities. Multi-objective Reinforcement Learning (MORL) offers a solution by learning a set of optimal policies along the Pareto Frontier, allowing for dynamic preference selection at test time. However, applying MORL in healthcare necessitates strict offline learning from historical data. In this paper, we benchmark three offline MORL algorithms, Conditioned Conservative Pareto Q-Learning (CPQL), Adaptive CPQL, and a modified Pareto Efficient Decision Agent (PEDA)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning