Hierarchical Reinforcement Learning: Approximating Optimal Discounted   TSP Using Local Policies

Tom Zahavy; Avinatan Hasidim; Haim Kaplan; Yishay Mansour

arXiv:1803.04674·cs.LG·March 14, 2018

Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour

PDF

Open Access

TL;DR

This paper introduces a theoretical framework for reward decomposition in deterministic MDPs, mapping it to a reward discounted TSP, and proposes stochastic local policies that outperform deterministic ones in hierarchical reinforcement learning.

Contribution

It provides the first theoretical guarantees for reward decomposition in deterministic MDPs and introduces stochastic local policies that improve upon deterministic heuristics.

Findings

01

Stochastic policies outperform deterministic policies in local reward decomposition.

02

The approach maps hierarchical RL to a reward discounted TSP for approximate solutions.

03

The proposed policies are computationally efficient and do not require planning.

Abstract

In this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs. Reward decomposition is a special case of Hierarchical Reinforcement Learning, that allows one to learn many policies in parallel and combine them into a composite solution. Our approach builds on mapping this problem into a Reward Discounted Traveling Salesman Problem, and then deriving approximate solutions for it. In particular, we focus on approximate solutions that are local, i.e., solutions that only observe information about the current state. Local policies are easy to implement and do not require substantial computational resources as they do not perform planning. While local deterministic policies, like Nearest Neighbor, are being used in practice for hierarchical reinforcement learning, we propose three stochastic policies that guarantee better performance than any deterministic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Adaptive Dynamic Programming Control