From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism

Sarthak Wanjari

arXiv:2602.08655·cs.LG·February 17, 2026

From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism

Sarthak Wanjari

PDF

Open Access

TL;DR

This paper introduces Geometric Pessimism, a compute-efficient offline RL method that enhances policy safety and performance by penalizing out-of-distribution actions using density-based measures, validated on benchmarks and clinical data.

Contribution

It proposes a modular, density-based penalty framework that improves offline RL stability and safety without high computational costs, outperforming existing methods on benchmarks and real-world datasets.

Findings

01

Geo-IQL outperforms standard IQL on MuJoCo tasks by over 18 points.

02

Reduces standard deviation of performance by 4 times.

03

Achieves 86.4% terminal agreement with clinicians on sepsis dataset.

Abstract

Offline Reinforcement Learning (RL) promises the recovery of optimal policies from static datasets, yet it remains susceptible to the overestimation of out-of-distribution (OOD) actions, particularly in fractured and sparse data manifolds. Current solutions necessitate a trade-off between computational efficiency and performance. Methods like CQL offer rigorous conservatism but require tremendous compute power while efficient expectile-based methods like IQL often fail to correct OOD errors on pathological datasets, collapsing to Behavioural Cloning. In this work, we propose Geometric Pessimism, a modular, compute-efficient framework that augments standard IQL with density-based penalty derived from k-nearest-neighbour distances in the state-action embedding space. By pre-computing the penalties applied to each state-action pair, our method injects OOD conservatism via reward shaping…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning