Temporal-adaptive Hierarchical Reinforcement Learning

Wen-Ji Zhou; Yang Yu

arXiv:2002.02080·cs.AI·February 7, 2020·1 cites

Temporal-adaptive Hierarchical Reinforcement Learning

Wen-Ji Zhou, Yang Yu

PDF

Open Access

TL;DR

This paper introduces TEMPLE, a hierarchical reinforcement learning method with an adaptive temporal gate that dynamically adjusts decision frequency, improving performance across various environments.

Contribution

The paper proposes a novel TEMPLE structure that adaptively controls high-level policy decision frequency using a temporal gate, enhancing HRL efficiency.

Findings

01

Improved performance in 2-D rooms, Mujoco, and Atari environments.

02

Adaptive temporal gating outperforms fixed strategies.

03

Enhanced decision granularity and efficiency in HRL.

Abstract

Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning. In HRL, the policy model has an inner representation structured in levels. With this structure, the reinforcement learning task is expected to be decomposed into corresponding levels with sub-tasks, and thus the learning can be more efficient. In HRL, although it is intuitive that a high-level policy only needs to make macro decisions in a low frequency, the exact frequency is hard to be simply determined. Previous HRL approaches often employed a fixed-time skip strategy or learn a terminal condition without taking account of the context, which, however, not only requires manual adjustments but also sacrifices some decision granularity. In this paper, we propose the \emph{temporal-adaptive hierarchical policy learning} (TEMPLE) structure, which uses a temporal gate to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications

MethodsTest · Entropy Regularization · Proximal Policy Optimization