# Composing Diverse Policies for Temporally Extended Tasks

**Authors:** Daniel Angelov, Yordan Hristov, Michael Burke, Subramanian Ramamoorthy

arXiv: 1907.08199 · 2020-02-18

## TL;DR

This paper presents a novel method for composing diverse control policies for complex, temporally extended tasks, leveraging local dynamics models and expert demonstrations to improve planning and robustness.

## Contribution

It introduces a global goal scoring estimator that sequences policies using local dynamics models, enabling efficient planning without explicit pre- and post-conditions.

## Key findings

- Successfully applied to an MDP benchmark demonstrating robustness
- Effectively solved a complex gear assembly task on a PR2 robot
- Discovered optimal controller sequences in diverse scenarios

## Abstract

Robot control policies for temporally extended and sequenced tasks are often characterized by discontinuous switches between different local dynamics. These change-points are often exploited in hierarchical motion planning to build approximate models and to facilitate the design of local, region-specific controllers. However, it becomes combinatorially challenging to implement such a pipeline for complex temporally extended tasks, especially when the sub-controllers work on different information streams, time scales and action spaces. In this paper, we introduce a method that can compose diverse policies comprising motion planning trajectories, dynamic motion primitives and neural network controllers. We introduce a global goal scoring estimator that uses local, per-motion primitive dynamics models and corresponding activation state-space sets to sequence diverse policies in a locally optimal fashion. We use expert demonstrations to convert what is typically viewed as a gradient-based learning process into a planning process without explicitly specifying pre- and post-conditions. We first illustrate the proposed framework using an MDP benchmark to showcase robustness to action and model dynamics mismatch, and then with a particularly complex physical gear assembly task, solved on a PR2 robot. We show that the proposed approach successfully discovers the optimal sequence of controllers and solves both tasks efficiently.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.08199/full.md

## Figures

22 figures with captions in the complete paper: https://tomesphere.com/paper/1907.08199/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/1907.08199/full.md

---
Source: https://tomesphere.com/paper/1907.08199