Smoothing Advantage Learning

Yaozhong Gan; Zhe Zhang; Xiaoyang Tan

arXiv:2203.10445·cs.LG·March 22, 2022

Smoothing Advantage Learning

Yaozhong Gan, Zhe Zhang, Xiaoyang Tan

PDF

Open Access 1 Video

TL;DR

This paper introduces Smoothing Advantage Learning (SAL), a variant of advantage learning that uses a smooth Bellman operator to improve stability and action gap in value-based reinforcement learning with function approximation.

Contribution

The paper proposes a simple smoothing technique for advantage learning to enhance stability and action gap, backed by theoretical analysis of its benefits.

Findings

01

SAL stabilizes training in function approximation scenarios.

02

The method increases the action gap between optimal and sub-optimal actions.

03

Theoretical bounds show improved convergence and error control.

Abstract

Advantage learning (AL) aims to improve the robustness of value-based reinforcement learning against estimation errors with action-gap-based regularization. Unfortunately, the method tends to be unstable in the case of function approximation. In this paper, we propose a simple variant of AL, named smoothing advantage learning (SAL), to alleviate this problem. The key to our method is to replace the original Bellman Optimal operator in AL with a smooth one so as to obtain more reliable estimation of the temporal difference target. We give a detailed account of the resulting action gap and the performance bound for approximate SAL. Further theoretical analysis reveals that the proposed value smoothing technique not only helps to stabilize the training procedure of AL by controlling the trade-off between convergence rate and the upper bound of the approximation errors, but is beneficial to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Smoothing Advantage Learning· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies