Conditional Sequence Modeling for Safe Reinforcement Learning
Wensong Bai, Chao Zhang, Qihang Xu, Chufan Chen, Chenhao Zhou, Hui Qian

TL;DR
This paper introduces RCDT, a novel conditional sequence modeling approach for offline safe reinforcement learning that enables zero-shot adaptation to multiple cost thresholds within a single policy, improving return-cost trade-offs.
Contribution
RCDT is the first CSM-based offline safe RL method that supports zero-shot deployment across multiple cost thresholds with an auto-adaptive penalty mechanism.
Findings
RCDT outperforms baseline methods on the DSRL benchmark.
It achieves better return--cost trade-offs.
The approach demonstrates consistent improvements across various scenarios.
Abstract
Offline safe reinforcement learning (RL) aims to learn policies from a fixed dataset while maximizing performance under cumulative cost constraints. In practice, deployment requirements often vary across scenarios, necessitating a single policy that can adapt zero-shot to different cost thresholds. However, most existing offline safe RL methods are trained under a pre-specified threshold, yielding policies with limited generalization and deployment flexibility across cost thresholds. Motivated by recent progress in conditional sequence modeling (CSM), which enables flexible goal-conditioned control by specifying target returns, we propose RCDT, a CSM-based method that supports zero-shot deployment across multiple cost thresholds within a single trained policy. RCDT is the first CSM-based offline safe RL algorithm that integrates a Lagrangian-style cost penalty with an auto-adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
