Multi-step Greedy Reinforcement Learning Algorithms
Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh

TL;DR
This paper introduces multi-step greedy algorithms for model-free reinforcement learning, demonstrating their ability to improve performance over standard methods like DQN and TRPO across Atari and MuJoCo benchmarks.
Contribution
It develops a general framework for multi-step greedy RL algorithms using surrogate problems, applicable with various existing RL methods, and provides insights on hyper-parameter tuning.
Findings
Algorithms outperform DQN and TRPO on benchmarks
Multi-step approach improves policy quality
Hyper-parameter tuning is crucial for performance
Abstract
Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL), both when a model of the environment is available (e.g.,~in the game of Go) and when it is learned. In this paper, we explore their benefits in model-free RL, when employed using multi-step dynamic programming algorithms: -Policy Iteration (-PI) and -Value Iteration (-VI). These methods iteratively compute the next policy (-PI) and value function (-VI) by solving a surrogate decision problem with a shaped reward and a smaller discount factor. We derive model-free RL algorithms based on -PI and -VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO. We identify the importance of a hyper-parameter that controls the extent to which the surrogate problem is solved and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and ELM · Elevator Systems and Control · Smart Parking Systems Research
MethodsQ-Learning · Dense Connections · Convolution · Trust Region Policy Optimization · Deep Q-Network
