Stabilizing Q Learning Via Soft Mellowmax Operator

Yaozhong Gan; Zhe Zhang; Xiaoyang Tan

arXiv:2012.09456·cs.LG·December 21, 2020·1 cites

Stabilizing Q Learning Via Soft Mellowmax Operator

Yaozhong Gan, Zhe Zhang, Xiaoyang Tan

PDF

Open Access 1 Video

TL;DR

This paper introduces SM2, an improved Soft Mellowmax operator for reinforcement learning that enhances stability, reliability, and performance guarantees, especially in high-dimensional and multi-agent scenarios.

Contribution

The paper proposes SM2, an enhanced Mellowmax operator with proven performance guarantees, addressing oversmoothing and parameter sensitivity issues in existing methods.

Findings

01

SM2 provides stable value function approximation in high-dimensional spaces.

02

Application of SM2 achieves state-of-the-art results in multi-agent reinforcement learning.

03

SM2 is reliable, easy to implement, and preserves the advantages of Mellowmax.

Abstract

Learning complicated value functions in high dimensional state space by function approximation is a challenging task, partially due to that the max-operator used in temporal difference updates can theoretically cause instability for most linear or non-linear approximation schemes. Mellowmax is a recently proposed differentiable and non-expansion softmax operator that allows a convergent behavior in learning and planning. Unfortunately, the performance bound for the fixed point it converges to remains unclear, and in practice, its parameter is sensitive to various domains and has to be tuned case by case. Finally, the Mellowmax operator may suffer from oversmoothing as it ignores the probability being taken for each action when aggregating them. In this paper, we address all the above issues with an enhanced Mellowmax operator, named SM2 (Soft Mellowmax). Particularly, the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stabilizing Q Learning via Soft Mellowmax Operator· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning

MethodsSoftmax