SaFormer: A Conditional Sequence Modeling Approach to Offline Safe   Reinforcement Learning

Qin Zhang; Linrui Zhang; Haoran Xu; Li Shen; Bowen Wang; and Yongzhe Chang; Xueqian Wang; Bo Yuan; Dacheng Tao

arXiv:2301.12203·cs.LG·January 31, 2023·1 cites

SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning

Qin Zhang, Linrui Zhang, Haoran Xu, Li Shen, Bowen Wang, and Yongzhe Chang, Xueqian Wang, Bo Yuan, Dacheng Tao

PDF

Open Access

TL;DR

SaFormer introduces a novel conditional sequence modeling approach for offline safe reinforcement learning, enabling constraint satisfaction, adaptability to changing safety requirements, and generalization beyond the training data.

Contribution

The paper proposes SaFormer, a new offline safe RL method using cost tokens and safety verification, improving constraint handling and adaptability over existing approaches.

Findings

01

Achieves competitive returns with safety constraints

02

Adapts to new safety costs without retraining

03

Generalizes to unseen safety constraints

Abstract

Offline safe RL is of great practical relevance for deploying agents in real-world applications. However, acquiring constraint-satisfying policies from the fixed dataset is non-trivial for conventional approaches. Even worse, the learned constraints are stationary and may become invalid when the online safety requirement changes. In this paper, we present a novel offline safe RL approach referred to as SaFormer, which tackles the above issues via conditional sequence modeling. In contrast to existing sequence models, we propose cost-related tokens to restrict the action space and a posterior safety verification to enforce the constraint explicitly. Specifically, SaFormer performs a two-stage auto-regression conditioned by the maximum remaining cost to generate feasible candidates. It then filters out unsafe attempts and executes the optimal action with the highest expected return.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Imbalanced Data Classification Techniques