Training Transition Policies via Distribution Matching for Complex Tasks

Ju-Seung Byun; Andrew Perrault

arXiv:2110.04357·cs.LG·March 15, 2022

Training Transition Policies via Distribution Matching for Complex Tasks

Ju-Seung Byun, Andrew Perrault

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces transition policies trained via adversarial inverse reinforcement learning to smoothly connect lower-level policies in hierarchical reinforcement learning, improving success rates on complex tasks by matching state-action distributions.

Contribution

It proposes a novel method for training transition policies using distribution matching and deep Q-learning, addressing the challenge of sparse rewards in hierarchical RL.

Findings

01

Achieves higher success rates in continuous locomotion and manipulation tasks.

02

Effectively matches state-action distributions between policies.

03

Outperforms previous trajectory search methods.

Abstract

Humans decompose novel complex tasks into simpler ones to exploit previously learned skills. Analogously, hierarchical reinforcement learning seeks to leverage lower-level policies for simple tasks to solve complex ones. However, because each lower-level policy induces a different distribution of states, transitioning from one lower-level policy to another may fail due to an unexpected starting state. We introduce transition policies that smoothly connect lower-level policies by producing a distribution of states and actions that matches what is expected by the next policy. Training transition policies is challenging because the natural reward signal -- whether the next policy can execute its subtask successfully -- is sparse. By training transition policies via adversarial inverse reinforcement learning to match the distribution of expected states and actions, we avoid relying on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shashacks/irl_transition
pytorchOfficial

Videos

Training Transition Policies via Distribution Matching for Complex Tasks· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning

MethodsQ-Learning