The Intentional Unintentional Agent: Learning to Solve Many Continuous   Control Tasks Simultaneously

Serkan Cabi; Sergio G\'omez Colmenarejo; Matthew W. Hoffman; Misha; Denil; Ziyu Wang; Nando de Freitas

arXiv:1707.03300·cs.AI·July 12, 2017·19 cites

The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

Serkan Cabi, Sergio G\'omez Colmenarejo, Matthew W. Hoffman, Misha, Denil, Ziyu Wang, Nando de Freitas

PDF

Open Access

TL;DR

This paper presents the IU agent, which extends DDPG to learn multiple continuous control tasks simultaneously, achieving faster learning and success in tasks where single-task methods fail, demonstrated in a MuJoCo environment.

Contribution

The paper introduces the IU agent that enables continuous control agents to learn multiple tasks at once, improving speed and robustness over single-task approaches.

Findings

01

IU agent learns faster than single-task DDPG agents.

02

IU agent successfully solves tasks where single-task DDPG fails.

03

Demonstrated in a MuJoCo environment with automatically generated tasks.

Abstract

This paper introduces the Intentional Unintentional (IU) agent. This agent endows the deep deterministic policy gradients (DDPG) agent for continuous control with the ability to solve several tasks simultaneously. Learning to solve many tasks simultaneously has been a long-standing, core goal of artificial intelligence, inspired by infant development and motivated by the desire to build flexible robot manipulators capable of many diverse behaviours. We show that the IU agent not only learns to solve many tasks simultaneously but it also learns faster than agents that target a single task at-a-time. In some cases, where the single task DDPG method completely fails, the IU agent successfully solves the task. To demonstrate this, we build a playroom environment using the MuJoCo physics engine, and introduce a grounded formal language to automatically generate tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Multimodal Machine Learning Applications

MethodsExperience Replay · Dense Connections · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Convolution · Batch Normalization · Deep Deterministic Policy Gradient