Regularized Anderson Acceleration for Off-Policy Deep Reinforcement   Learning

Wenjie Shi; Shiji Song; Hui Wu; Ya-Chu Hsu; Cheng Wu; Gao Huang

arXiv:1909.03245·cs.LG·December 7, 2021·5 cites

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Wenjie Shi, Shiji Song, Hui Wu, Ya-Chu Hsu, Cheng Wu, Gao Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a regularized Anderson acceleration method to enhance the convergence speed and performance of off-policy deep reinforcement learning algorithms, effectively addressing sample inefficiency and slow learning in complex environments.

Contribution

It extends Anderson acceleration with regularization to deep RL, proposing strategies like progressive update and adaptive restart to improve learning efficiency.

Findings

01

Significantly faster learning speed on benchmark tasks

02

Improved final performance of deep RL algorithms

03

Effective in high-dimensional, continuous control environments

Abstract

Model-free deep reinforcement learning (RL) algorithms have been widely used for a range of complex control tasks. However, slow convergence and sample inefficiency remain challenging problems in RL, especially when handling continuous and high-dimensional state spaces. To tackle this problem, we propose a general acceleration method for model-free, off-policy deep RL algorithms by drawing the idea underlying regularized Anderson acceleration (RAA), which is an effective approach to accelerating the solving of fixed point problems with perturbations. Specifically, we first explain how policy iteration can be applied directly with Anderson acceleration. Then we extend RAA to the case of deep RL by introducing a regularization term to control the impact of perturbation induced by function approximation errors. We further propose two strategies, i.e., progressive update and adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shiwj16/raa-drl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Model Reduction and Neural Networks · Adaptive Dynamic Programming Control

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings