Multi-step Greedy Reinforcement Learning Algorithms

Manan Tomar; Yonathan Efroni; Mohammad Ghavamzadeh

arXiv:1910.02919·cs.LG·July 14, 2020·1 cites

Multi-step Greedy Reinforcement Learning Algorithms

Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

PDF

Open Access 1 Video

TL;DR

This paper introduces multi-step greedy algorithms for model-free reinforcement learning, demonstrating their ability to improve performance over standard methods like DQN and TRPO across Atari and MuJoCo benchmarks.

Contribution

It develops a general framework for multi-step greedy RL algorithms using surrogate problems, applicable with various existing RL methods, and provides insights on hyper-parameter tuning.

Findings

01

Algorithms outperform DQN and TRPO on benchmarks

02

Multi-step approach improves policy quality

03

Hyper-parameter tuning is crucial for performance

Abstract

Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL), both when a model of the environment is available (e.g.,~in the game of Go) and when it is learned. In this paper, we explore their benefits in model-free RL, when employed using multi-step dynamic programming algorithms: $κ$ -Policy Iteration ( $κ$ -PI) and $κ$ -Value Iteration ( $κ$ -VI). These methods iteratively compute the next policy ( $κ$ -PI) and value function ( $κ$ -VI) by solving a surrogate decision problem with a shaped reward and a smaller discount factor. We derive model-free RL algorithms based on $κ$ -PI and $κ$ -VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO. We identify the importance of a hyper-parameter that controls the extent to which the surrogate problem is solved and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-step Greedy Reinforcement Learning Algorithms· slideslive

Taxonomy

TopicsMachine Learning and ELM · Elevator Systems and Control · Smart Parking Systems Research

MethodsQ-Learning · Dense Connections · Convolution · Trust Region Policy Optimization · Deep Q-Network