Temporal-adaptive Hierarchical Reinforcement Learning
Wen-Ji Zhou, Yang Yu

TL;DR
This paper introduces TEMPLE, a hierarchical reinforcement learning method with an adaptive temporal gate that dynamically adjusts decision frequency, improving performance across various environments.
Contribution
The paper proposes a novel TEMPLE structure that adaptively controls high-level policy decision frequency using a temporal gate, enhancing HRL efficiency.
Findings
Improved performance in 2-D rooms, Mujoco, and Atari environments.
Adaptive temporal gating outperforms fixed strategies.
Enhanced decision granularity and efficiency in HRL.
Abstract
Hierarchical reinforcement learning (HRL) helps address large-scale and sparse reward issues in reinforcement learning. In HRL, the policy model has an inner representation structured in levels. With this structure, the reinforcement learning task is expected to be decomposed into corresponding levels with sub-tasks, and thus the learning can be more efficient. In HRL, although it is intuitive that a high-level policy only needs to make macro decisions in a low frequency, the exact frequency is hard to be simply determined. Previous HRL approaches often employed a fixed-time skip strategy or learn a terminal condition without taking account of the context, which, however, not only requires manual adjustments but also sacrifices some decision granularity. In this paper, we propose the \emph{temporal-adaptive hierarchical policy learning} (TEMPLE) structure, which uses a temporal gate to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Evolutionary Algorithms and Applications
MethodsTest · Entropy Regularization · Proximal Policy Optimization
