Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

Adrien Bolland; Gaspard Lambrechts; Damien Ernst

arXiv:2412.06655·cs.LG·September 30, 2025

Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures

Adrien Bolland, Gaspard Lambrechts, Damien Ernst

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel off-policy maximum entropy reinforcement learning method that uses future state and action visitation measures as intrinsic rewards, leading to improved exploration and control performance.

Contribution

The paper proposes a new intrinsic reward based on the relative entropy of future visitation distributions, enabling off-policy learning of this measure and enhancing exploration.

Findings

01

Policies achieve high state-action coverage

02

Method improves exploration efficiency

03

Results show strong control performance

Abstract

Maximum entropy reinforcement learning integrates exploration into policy learning by providing additional intrinsic rewards proportional to the entropy of some distribution. In this paper, we propose a novel approach in which the intrinsic reward function is the relative entropy of the discounted distribution of states and actions (or features derived from these states and actions) visited during future time steps. This approach is motivated by two results. First, a policy maximizing the expected discounted sum of intrinsic rewards also maximizes a lower bound on the state-action value function of the decision process. Second, the distribution used in the intrinsic reward definition is the fixed point of a contraction operator. Existing algorithms can therefore be adapted to learn this fixed point off-policy and to compute the intrinsic rewards. We finally introduce an algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adrienBolland/future-visitation-exploration
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvancements in Semiconductor Devices and Circuit Design