Conditional Sequence Modeling for Safe Reinforcement Learning

Wensong Bai; Chao Zhang; Qihang Xu; Chufan Chen; Chenhao Zhou; Hui Qian

arXiv:2602.08584·cs.LG·February 10, 2026

Conditional Sequence Modeling for Safe Reinforcement Learning

Wensong Bai, Chao Zhang, Qihang Xu, Chufan Chen, Chenhao Zhou, Hui Qian

PDF

Open Access

TL;DR

This paper introduces RCDT, a novel conditional sequence modeling approach for offline safe reinforcement learning that enables zero-shot adaptation to multiple cost thresholds within a single policy, improving return-cost trade-offs.

Contribution

RCDT is the first CSM-based offline safe RL method that supports zero-shot deployment across multiple cost thresholds with an auto-adaptive penalty mechanism.

Findings

01

RCDT outperforms baseline methods on the DSRL benchmark.

02

It achieves better return--cost trade-offs.

03

The approach demonstrates consistent improvements across various scenarios.

Abstract

Offline safe reinforcement learning (RL) aims to learn policies from a fixed dataset while maximizing performance under cumulative cost constraints. In practice, deployment requirements often vary across scenarios, necessitating a single policy that can adapt zero-shot to different cost thresholds. However, most existing offline safe RL methods are trained under a pre-specified threshold, yielding policies with limited generalization and deployment flexibility across cost thresholds. Motivated by recent progress in conditional sequence modeling (CSM), which enables flexible goal-conditioned control by specifying target returns, we propose RCDT, a CSM-based method that supports zero-shot deployment across multiple cost thresholds within a single trained policy. RCDT is the first CSM-based offline safe RL algorithm that integrates a Lagrangian-style cost penalty with an auto-adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning