Stabilizing Extreme Q-learning by Maclaurin Expansion

Motoki Omura; Takayuki Osa; Yusuke Mukuta; Tatsuya Harada

arXiv:2406.04896·cs.LG·February 12, 2025

Stabilizing Extreme Q-learning by Maclaurin Expansion

Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

PDF

Open Access 1 Repo

TL;DR

This paper introduces Maclaurin Expanded Extreme Q-learning, a method that stabilizes extreme Q-learning by applying Maclaurin expansion to the loss function, improving stability and performance in offline and online reinforcement learning tasks.

Contribution

The paper proposes a novel Maclaurin expansion technique for Extreme Q-learning, enhancing stability and allowing flexible error distribution modeling, which improves RL performance.

Findings

01

Significantly stabilizes learning in online RL tasks from DM Control.

02

Improves performance in offline RL tasks from D4RL.

03

Enables adjustment of error distribution assumptions from normal to Gumbel.

Abstract

In offline reinforcement learning, in-sample learning methods have been widely used to prevent performance degradation caused by evaluating out-of-distribution actions from the dataset. Extreme Q-learning (XQL) employs a loss function based on the assumption that Bellman error follows a Gumbel distribution, enabling it to model the soft optimal value function in an in-sample manner. It has demonstrated strong performance in both offline and online reinforcement learning settings. However, issues remain, such as the instability caused by the exponential term in the loss function and the risk of the error distribution deviating from the Gumbel distribution. Therefore, we propose Maclaurin Expanded Extreme Q-learning to enhance stability. In this method, applying Maclaurin expansion to the loss function in XQL enhances stability against large errors. This approach involves adjusting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

motokiomura/MXQL
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeuroinflammation and Neurodegeneration Mechanisms · Blind Source Separation Techniques · Remote-Sensing Image Classification

MethodsQ-Learning