Reinforcement Learning with Convex Constraints
Sobhan Miryoosefi, Kiant\'e Brantley, Hal Daum\'e III, Miroslav Dudik,, Robert Schapire

TL;DR
This paper introduces a flexible reinforcement learning framework that incorporates convex constraints on expected measurements, enabling safer, more diverse, and more expert-like behaviors with theoretical guarantees.
Contribution
It proposes a general algorithmic scheme for constrained RL that handles convex expected-value constraints, extending previous safety-focused methods to new properties like diversity.
Findings
Matches existing safety constraint algorithms in performance
Enforces new properties such as diversity
Applicable to model-free and model-based RL
Abstract
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks: specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
