AutoCost: Evolving Intrinsic Cost for Zero-violation Reinforcement Learning
Tairan He, Weiye Zhao, Changliu Liu

TL;DR
AutoCost introduces an automatic method to find intrinsic cost functions that enable constrained reinforcement learning algorithms to achieve zero constraint violations while maintaining competitive performance.
Contribution
The paper proposes AutoCost, a framework that automatically searches for intrinsic cost functions to improve safety in constrained RL, achieving zero violations.
Findings
AutoCost finds cost functions that lead to zero violations in Safety Gym.
Intrinsic costs enable policies to satisfy safety constraints without performance loss.
The method outperforms baseline approaches in safety benchmarks.
Abstract
Safety is a critical hurdle that limits the application of deep reinforcement learning (RL) to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision processes. However, such constrained RL methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functions that help constrained RL to achieve zero-violation performance. We validate the proposed method and the searched cost function on the safe RL benchmark Safety Gym. We compare the performance of augmented agents that use our cost function to provide additive intrinsic costs with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
Methodsfail
