Learning a Skill-sequence-dependent Policy for Long-horizon Manipulation Tasks
Zhihao Li, Zhenglong Sun, Jionglong SU, Jiaming Zhang

TL;DR
This paper introduces a hierarchical reinforcement learning approach that incorporates skill sequences into the policy, significantly improving sample efficiency and performance in long-horizon robotic manipulation tasks.
Contribution
The paper proposes a novel skill-sequence-dependent hierarchical policy that combines high-level skill planning with low-level observation-based control for long-horizon tasks.
Findings
Successfully solves long-horizon manipulation tasks in simulation.
Achieves faster learning compared to PPO and task schema methods.
Enhances sample efficiency by integrating skill sequences into policy learning.
Abstract
In recent years, the robotics community has made substantial progress in robotic manipulation using deep reinforcement learning (RL). Effectively learning of long-horizon tasks remains a challenging topic. Typical RL-based methods approximate long-horizon tasks as Markov decision processes and only consider current observation (images or other sensor information) as input state. However, such approximation ignores the fact that skill-sequence also plays a crucial role in long-horizon tasks. In this paper, we take both the observation and skill sequences into account and propose a skill-sequence-dependent hierarchical policy for solving a typical long-horizon task. The proposed policy consists of a high-level skill policy (utilizing skill sequences) and a low-level parameter policy (responding to observation) with corresponding training methods, which makes the learning much more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
