A Generalist Agent
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo,, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay,, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali Razavi, Ashley Edwards,, Nicolas Heess, Yutian Chen, Raia Hadsell

TL;DR
Gato is a versatile generalist agent capable of performing multiple tasks across different modalities and embodiments using a single neural network trained on diverse data, demonstrating broad applicability beyond text.
Contribution
This work introduces Gato, a unified multi-modal, multi-task, multi-embodiment agent that can handle various tasks with a single model, advancing generalist AI development.
Findings
Gato can play Atari, caption images, chat, and control robots.
The same network adapts to different modalities and outputs.
Gato demonstrates broad task versatility with a unified architecture.
Abstract
Inspired by progress in large-scale language modeling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens. In this report we describe the model and the data, and document the current capabilities of Gato.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Dense Connections · Softmax
