Implicitly Regularized RL with Implicit Q-Values

Nino Vieillard; Marcin Andrychowicz; Anton Raichuk; Olivier Pietquin,; Matthieu Geist

arXiv:2108.07041·cs.LG·June 1, 2022·1 cites

Implicitly Regularized RL with Implicit Q-Values

Nino Vieillard, Marcin Andrychowicz, Anton Raichuk, Olivier Pietquin,, Matthieu Geist

PDF

Open Access

TL;DR

This paper introduces a novel implicit Q-function parametrization in reinforcement learning that enables effective handling of large action spaces and maintains the softmax policy relation, supported by theoretical analysis and competitive experimental results.

Contribution

It proposes an implicit Q-function parametrization that facilitates large action space RL and derives a practical off-policy deep RL algorithm with theoretical guarantees.

Findings

01

Algorithm performs well on classic control tasks

02

Theoretical analysis shows equivalence to regularized value iteration

03

Enforces softmax relation between policy and Q-values

Abstract

The $Q$ -function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy w.r.t. to $Q$ . It is a powerful tool that allows action selection without a model of the environment and even without explicitly modeling the policy. Yet, this scheme can only be used in discrete action tasks, with small numbers of actions, as the softmax cannot be computed exactly otherwise. Especially the usage of function approximation, to deal with continuous action spaces in modern actor-critic architectures, intrinsically prevents the exact computation of a softmax. We propose to alleviate this issue by parametrizing the $Q$ -function implicitly, as the sum of a log-policy and of a value function. We use the resulting parametrization to derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Smart Grid Energy Management

MethodsSoftmax