Challenging Common Assumptions in Convex Reinforcement Learning

Mirco Mutti; Riccardo De Santi; Piersilvio De Bartolomeis; Marcello; Restelli

arXiv:2202.01511·cs.LG·January 30, 2023·1 cites

Challenging Common Assumptions in Convex Reinforcement Learning

Mirco Mutti, Riccardo De Santi, Piersilvio De Bartolomeis, Marcello, Restelli

PDF

Open Access

TL;DR

This paper reveals that the common assumption equating finite and infinite trials objectives in convex RL is invalid, highlighting potential approximation errors and urging for revised methodologies in practical applications.

Contribution

It demonstrates that the equivalence between finite and infinite trials objectives in classic RL does not hold in convex RL, challenging a key assumption and proposing a need for new approaches.

Findings

01

Erroneous optimization of infinite trials objective causes significant approximation errors.

02

Finite trials setting differs from infinite trials in convex RL, impacting practical applications.

03

Shedding light on this issue can improve convex RL methodologies.

Abstract

The classic Reinforcement Learning (RL) formulation concerns the maximization of a scalar reward function. More recently, convex RL has been introduced to extend the RL formulation to all the objectives that are convex functions of the state distribution induced by a policy. Notably, convex RL covers several relevant applications that do not fall into the scalar formulation, including imitation learning, risk-averse RL, and pure exploration. In classic RL, it is common to optimize an infinite trials objective, which accounts for the state distribution instead of the empirical state visitation frequencies, even though the actual number of trajectories is always finite in practice. This is theoretically sound since the infinite trials and finite trials objectives can be proved to coincide and thus lead to the same optimal policy. In this paper, we show that this hidden assumption does not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics