LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning
Outongyi Lv, Bingxin Zhou

TL;DR
This paper explores the distribution of Bellman errors in reinforcement learning, proposing a Logistic distribution-based loss function that improves performance over traditional methods across various environments.
Contribution
It introduces the Logistic maximum likelihood loss (LLoss) for Bellman error approximation, validated through extensive experiments and distribution tests, offering a new perspective on Bellman error modeling.
Findings
Bellman error approximately follows a Logistic distribution
LLoss consistently outperforms MSELoss in experiments
Kolmogorov-Smirnov tests confirm Logistic distribution fit
Abstract
Modern reinforcement learning (RL) can be categorized into online and offline variants. As a pivotal aspect of both online and offline RL, current research on the Bellman equation revolves primarily around optimization techniques and performance enhancement rather than exploring the inherent structural properties of the Bellman error, such as its distribution characteristics. This study investigates the distribution of the Bellman approximation error through iterative exploration of the Bellman equation with the observation that the Bellman error approximately follows the Logistic distribution. Based on this, we proposed the utilization of the Logistic maximum likelihood function (LLoss) as an alternative to the commonly used mean squared error (MSELoss) that assumes a Normal distribution for Bellman errors. We validated the hypotheses through extensive numerical experiments across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics
MethodsFocus
