Evaluation of Habitat Robotics using Large Language Models

William Li; Lei Hamilton; Kaise Al-natour; Sanjeev Mohindra

arXiv:2507.06157·cs.RO·July 9, 2025

Evaluation of Habitat Robotics using Large Language Models

William Li, Lei Hamilton, Kaise Al-natour, Sanjeev Mohindra

PDF

Open Access

TL;DR

This study evaluates the performance of various Large Language Models in robotic tasks within simulated kitchen environments, highlighting the superior reasoning capabilities of models like OpenAI o3-mini over others like GPT-4o and Llama 3.

Contribution

It introduces the Meta PARTNER benchmark for assessing LLMs in embodied robotic tasks and demonstrates the effectiveness of reasoning models in such environments.

Findings

01

o3-mini outperforms GPT-4o and Llama 3 in robotic tasks

02

Reasoning models excel in both observable and partially observable environments

03

Results suggest promising directions for embodied robotic development

Abstract

This paper focuses on evaluating the effectiveness of Large Language Models at solving embodied robotic tasks using the Meta PARTNER benchmark. Meta PARTNR provides simplified environments and robotic interactions within randomized indoor kitchen scenes. Each randomized kitchen scene is given a task where two robotic agents cooperatively work together to solve the task. We evaluated multiple frontier models on Meta PARTNER environments. Our results indicate that reasoning models like OpenAI o3-mini outperform non-reasoning models like OpenAI GPT-4o and Llama 3 when operating in PARTNR's robotic embodied environments. o3-mini displayed outperform across centralized, decentralized, full observability, and partial observability configurations. This provides a promising avenue of research for embodied robotic development.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning

MethodsLLaMA