Concave Utility Reinforcement Learning with Zero-Constraint Violations

Mridul Agarwal; Qinbo Bai; Vaneet Aggarwal

arXiv:2109.05439·cs.LG·November 20, 2023·1 cites

Concave Utility Reinforcement Learning with Zero-Constraint Violations

Mridul Agarwal, Qinbo Bai, Vaneet Aggarwal

PDF

Open Access

TL;DR

This paper introduces a model-based reinforcement learning algorithm for concave utility optimization with convex constraints, ensuring zero constraint violations and providing regret guarantees in tabular infinite-horizon settings.

Contribution

It proposes a novel optimization approach that guarantees zero constraint violations and offers regret bounds, improving computational efficiency for constrained reinforcement learning.

Findings

01

Achieves zero constraint violations in reinforcement learning.

02

Provides high-probability regret bounds of order (1/) with theoretical guarantees.

03

Applicable to both optimistic and posterior sampling algorithms.

Abstract

We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. For this, we propose a model-based learning algorithm that also achieves zero constraint violations. Assuming that the concave objective and the convex constraints have a solution interior to the set of feasible occupation measures, we solve a tighter optimization problem to ensure that the constraints are never violated despite the imprecise model knowledge and model stochasticity. We use Bellman error-based analysis for tabular infinite-horizon setups which allows analyzing stochastic policies. Combining the Bellman error-based analysis and tighter optimization equation, for $T$ interactions with the environment, we obtain a high-probability regret guarantee for objective which grows as $\Tilde O (1/ T)$ , excluding other factors. The proposed method can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Risk and Portfolio Optimization