Mutual Information Regularized Offline Reinforcement Learning

Xiao Ma; Bingyi Kang; Zhongwen Xu; Min Lin; Shuicheng Yan

arXiv:2210.07484·cs.LG·February 29, 2024

Mutual Information Regularized Offline Reinforcement Learning

Xiao Ma, Bingyi Kang, Zhongwen Xu, Min Lin, Shuicheng Yan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MISA, a mutual information regularization framework for offline reinforcement learning that constrains policy improvement to data-supported actions, leading to improved performance over existing methods.

Contribution

MISA unifies and extends existing offline RL methods by directly constraining policy updates via mutual information bounds, enhancing stability and performance.

Findings

01

MISA outperforms baselines on D4RL benchmarks.

02

Tighter mutual information bounds improve offline RL results.

03

MISA achieves 742.9 points on gym-locomotion tasks.

Abstract

The major challenge of offline RL is the distribution shift that appears when out-of-distribution actions are queried, which makes the policy improvement direction biased by extrapolation errors. Most existing methods address this problem by penalizing the policy or value for deviating from the behavior policy during policy improvement or evaluation. In this work, we propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset by directly constraining the policy improvement direction. MISA constructs lower bounds of mutual information parameterized by the policy and Q-values. We show that optimizing this lower bound is equivalent to maximizing the likelihood of a one-step improved policy on the offline dataset. Hence, we constrain the policy improvement direction to lie in the data manifold. The resulting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sail-sg/misa
jaxOfficial

Videos

Mutual Information Regularized Offline Reinforcement Learning· slideslive

Taxonomy

TopicsRobotic Locomotion and Control · Reinforcement Learning in Robotics