Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care
Aryaman Bansal, Divya Sharma

TL;DR
This paper benchmarks offline multi-objective reinforcement learning algorithms in critical care, demonstrating that sequence modeling architectures like PEDA DT outperform scalarized baselines in flexibility and decision-making adaptability.
Contribution
It introduces a comprehensive benchmarking of offline MORL algorithms in healthcare, highlighting the effectiveness of sequence modeling architectures for multi-objective decision-making.
Findings
PEDA DT outperforms scalarized baselines in flexibility
Sequence modeling architectures are effective in multi-objective healthcare tasks
Offline MORL enables personalized decision-making without retraining
Abstract
In critical care settings such as the Intensive Care Unit, clinicians face the complex challenge of balancing conflicting objectives, primarily maximizing patient survival while minimizing resource utilization (e.g., length of stay). Single-objective Reinforcement Learning approaches typically address this by optimizing a fixed scalarized reward function, resulting in rigid policies that fail to adapt to varying clinical priorities. Multi-objective Reinforcement Learning (MORL) offers a solution by learning a set of optimal policies along the Pareto Frontier, allowing for dynamic preference selection at test time. However, applying MORL in healthcare necessitates strict offline learning from historical data. In this paper, we benchmark three offline MORL algorithms, Conditioned Conservative Pareto Q-Learning (CPQL), Adaptive CPQL, and a modified Pareto Efficient Decision Agent (PEDA)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning in Healthcare · Adversarial Robustness in Machine Learning
