Training Transition Policies via Distribution Matching for Complex Tasks
Ju-Seung Byun, Andrew Perrault

TL;DR
This paper introduces transition policies trained via adversarial inverse reinforcement learning to smoothly connect lower-level policies in hierarchical reinforcement learning, improving success rates on complex tasks by matching state-action distributions.
Contribution
It proposes a novel method for training transition policies using distribution matching and deep Q-learning, addressing the challenge of sparse rewards in hierarchical RL.
Findings
Achieves higher success rates in continuous locomotion and manipulation tasks.
Effectively matches state-action distributions between policies.
Outperforms previous trajectory search methods.
Abstract
Humans decompose novel complex tasks into simpler ones to exploit previously learned skills. Analogously, hierarchical reinforcement learning seeks to leverage lower-level policies for simple tasks to solve complex ones. However, because each lower-level policy induces a different distribution of states, transitioning from one lower-level policy to another may fail due to an unexpected starting state. We introduce transition policies that smoothly connect lower-level policies by producing a distribution of states and actions that matches what is expected by the next policy. Training transition policies is challenging because the natural reward signal -- whether the next policy can execute its subtask successfully -- is sparse. By training transition policies via adversarial inverse reinforcement learning to match the distribution of expected states and actions, we avoid relying on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning
MethodsQ-Learning
