Offline Safe Reinforcement Learning Using Trajectory Classification

Ze Gong; Akshat Kumar; Pradeep Varakantham

arXiv:2412.15429·cs.LG·April 22, 2025

Offline Safe Reinforcement Learning Using Trajectory Classification

Ze Gong, Akshat Kumar, Pradeep Varakantham

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a trajectory classification approach for offline safe reinforcement learning, enabling policies to generate desirable behaviors while avoiding unsafe ones, thus improving safety and reward outcomes.

Contribution

The paper proposes a novel offline safe RL method that classifies trajectories into desirable and undesirable sets, bypassing complex min-max optimization and enhancing safety and performance.

Findings

01

Outperforms baseline methods on DSRL benchmark

02

Achieves higher rewards and better safety constraints

03

Effectively distinguishes safe and unsafe trajectories

Abstract

Offline safe reinforcement learning (RL) has emerged as a promising approach for learning safe behaviors without engaging in risky online interactions with the environment. Most existing methods in offline safe RL rely on cost constraints at each time step (derived from global cost constraints) and this can result in either overly conservative policies or violation of safety constraints. In this paper, we propose to learn a policy that generates desirable trajectories and avoids undesirable trajectories. To be specific, we first partition the pre-collected dataset of state-action trajectories into desirable and undesirable subsets. Intuitively, the desirable set contains high reward and safe trajectories, and undesirable set contains unsafe trajectories and low-reward safe trajectories. Second, we learn a policy that generates desirable trajectories and avoids undesirable trajectories,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zgong11/TraC
pytorchOfficial

Videos

Offline Safe Reinforcement Learning Using Trajectory Classification· underline

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Anomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training