TANGO: Training-free Embodied AI Agents for Open-world Tasks
Filippo Ziliotto, Tommaso Campari, Luciano Serafini, Lamberto Ballan

TL;DR
TANGO enables embodied AI agents to perform diverse open-world tasks by leveraging large language models for program composition, without additional training, achieving state-of-the-art results in zero-shot scenarios.
Contribution
The paper introduces TANGO, a training-free approach that integrates LLMs with simple primitives for versatile embodied AI task execution.
Findings
Achieves state-of-the-art zero-shot performance on multiple embodied AI tasks.
Demonstrates effective task generalization without additional training.
Utilizes LLMs for dynamic program composition in embodied agents.
Abstract
Large Language Models (LLMs) have demonstrated excellent capabilities in composing various modules together to create programs that can perform complex reasoning tasks on images. In this paper, we propose TANGO, an approach that extends the program composition via LLMs already observed for images, aiming to integrate those capabilities into embodied agents capable of observing and acting in the world. Specifically, by employing a simple PointGoal Navigation model combined with a memory-based exploration policy as a foundational primitive for guiding an agent through the world, we show how a single model can address diverse tasks without additional training. We task an LLM with composing the provided primitives to solve a specific task, using only a few in-context examples in the prompt. We evaluate our approach on three key Embodied AI tasks: Open-Set ObjectGoal Navigation, Multi-Modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
