Human-Timescale Adaptation in an Open-Ended Task Space

Adaptive Agent Team; Jakob Bauer; Kate Baumli; Satinder Baveja; Feryal; Behbahani; Avishkar Bhoopchand; Nathalie Bradley-Schmieg; Michael Chang,; Natalie Clay; Adrian Collister; Vibhavari Dasagi; Lucy Gonzalez; Karol; Gregor; Edward Hughes; Sheleem Kashem; Maria Loks-Thompson; Hannah Openshaw,; Jack Parker-Holder; Shreya Pathak; Nicolas Perez-Nieves; Nemanja Rakicevic,; Tim Rockt\"aschel; Yannick Schroecker; Jakub Sygnowski; Karl Tuyls; Sarah; York; Alexander Zacherl; Lei Zhang

arXiv:2301.07608·cs.LG·January 19, 2023·22 cites

Human-Timescale Adaptation in an Open-Ended Task Space

Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal, Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang,, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol, Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson

PDF

Open Access

TL;DR

This paper shows that large-scale reinforcement learning with meta-learning and attention-based memory enables agents to adapt quickly to new, open-ended 3D environments, resembling human-like in-context learning capabilities.

Contribution

It introduces a novel RL agent trained with meta-learning, attention-based memory, and curriculum learning, achieving rapid adaptation in open-ended 3D tasks.

Findings

01

Scaling laws relate network size, memory, and task diversity to performance.

02

The adaptive agent demonstrates hypothesis-driven exploration and efficient knowledge use.

03

Prompting with demonstrations enhances adaptation success.

Abstract

Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques