From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism
Sarthak Wanjari

TL;DR
This paper introduces Geometric Pessimism, a compute-efficient offline RL method that enhances policy safety and performance by penalizing out-of-distribution actions using density-based measures, validated on benchmarks and clinical data.
Contribution
It proposes a modular, density-based penalty framework that improves offline RL stability and safety without high computational costs, outperforming existing methods on benchmarks and real-world datasets.
Findings
Geo-IQL outperforms standard IQL on MuJoCo tasks by over 18 points.
Reduces standard deviation of performance by 4 times.
Achieves 86.4% terminal agreement with clinicians on sepsis dataset.
Abstract
Offline Reinforcement Learning (RL) promises the recovery of optimal policies from static datasets, yet it remains susceptible to the overestimation of out-of-distribution (OOD) actions, particularly in fractured and sparse data manifolds. Current solutions necessitate a trade-off between computational efficiency and performance. Methods like CQL offer rigorous conservatism but require tremendous compute power while efficient expectile-based methods like IQL often fail to correct OOD errors on pathological datasets, collapsing to Behavioural Cloning. In this work, we propose Geometric Pessimism, a modular, compute-efficient framework that augments standard IQL with density-based penalty derived from k-nearest-neighbour distances in the state-action embedding space. By pre-computing the penalties applied to each state-action pair, our method injects OOD conservatism via reward shaping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
