Online Optimization for Offline Safe Reinforcement Learning

Yassine Chemingui; Aryan Deshwal; Alan Fern; Thanh Nguyen-Tang; Janardhan Rao Doppa

arXiv:2510.22027·cs.LG·October 28, 2025

Online Optimization for Offline Safe Reinforcement Learning

Yassine Chemingui, Aryan Deshwal, Alan Fern, Thanh Nguyen-Tang, Janardhan Rao Doppa

PDF

1 Video

TL;DR

This paper introduces a novel offline safe reinforcement learning method that combines offline RL with online optimization, ensuring safety constraints are met while maximizing rewards, demonstrated on benchmark tasks.

Contribution

The paper proposes a new minimax framework for offline safe RL that integrates online optimization, eliminating the need for offline policy evaluation and ensuring safety under cost constraints.

Findings

01

Successfully enforces safety constraints on DSRL benchmark

02

Achieves high rewards while maintaining safety under strict budgets

03

Provides a practical approach compatible with any offline RL algorithm

Abstract

We study the problem of Offline Safe Reinforcement Learning (OSRL), where the goal is to learn a reward-maximizing policy from fixed data under a cumulative cost constraint. We propose a novel OSRL approach that frames the problem as a minimax objective and solves it by combining offline RL with online optimization algorithms. We prove the approximate optimality of this approach when integrated with an approximate offline RL oracle and no-regret online optimization. We also present a practical approximation that can be combined with any offline RL algorithm, eliminating the need for offline policy evaluation. Empirical results on the DSRL benchmark demonstrate that our method reliably enforces safety constraints under stringent cost budgets, while achieving high rewards. The code is available at https://github.com/yassineCh/O3SRL.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Online Optimization for Offline Safe Reinforcement Learning· slideslive