PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent   Tasks

Matthew Chang; Gunjan Chhablani; Alexander Clegg; Mikael Dallaire; Cote; Ruta Desai; Michal Hlavac; Vladimir Karashchuk; Jacob Krantz; Roozbeh; Mottaghi; Priyam Parashar; Siddharth Patki; Ishita Prasad; Xavier Puig,; Akshara Rai; Ram Ramrakhya; Daniel Tran; Joanne Truong; John M. Turner; Eric; Undersander; Tsung-Yen Yang

arXiv:2411.00081·cs.RO·November 4, 2024

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

Matthew Chang, Gunjan Chhablani, Alexander Clegg, Mikael Dallaire, Cote, Ruta Desai, Michal Hlavac, Vladimir Karashchuk, Jacob Krantz, Roozbeh, Mottaghi, Priyam Parashar, Siddharth Patki, Ishita Prasad, Xavier Puig,, Akshara Rai, Ram Ramrakhya, Daniel Tran, Joanne Truong

PDF

Open Access 1 Repo 1 Datasets

TL;DR

PARTNR is a large-scale benchmark designed to evaluate planning and reasoning in human-robot collaboration tasks, revealing current model limitations and guiding future improvements in embodied multi-agent systems.

Contribution

The paper introduces PARTNR, the largest benchmark for embodied multi-agent tasks, using LLMs and simulation for task generation, and provides comprehensive analysis of model performance and challenges.

Findings

01

State-of-the-art LLMs struggle with coordination and error recovery.

02

Fine-tuning smaller LLMs can match larger models' performance.

03

Human-LLM collaboration requires more steps than human-human collaboration.

Abstract

We present a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration (PARTNR) designed to study human-robot coordination in household activities. PARTNR tasks exhibit characteristics of everyday tasks, such as spatial, temporal, and heterogeneous agent capability constraints. We employ a semi-automated task generation pipeline using Large Language Models (LLMs), incorporating simulation in the loop for grounding and verification. PARTNR stands as the largest benchmark of its kind, comprising 100,000 natural language tasks, spanning 60 houses and 5,819 unique objects. We analyze state-of-the-art LLMs on PARTNR tasks, across the axes of planning, perception and skill execution. The analysis reveals significant limitations in SoTA models, such as poor coordination and failures in task tracking and recovery from errors. When LLMs are paired with real humans, they require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/partnr-planner
pytorchOfficial

Datasets

ai-habitat/partnr_episodes
dataset· 118 dl
118 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation