Deploying and Evaluating LLMs to Program Service Mobile Robots

Zichao Hu; Francesca Lucchetti; Claire Schlesinger; Yash Saxena,; Anders Freeman; Sadanand Modak; Arjun Guha; Joydeep Biswas

arXiv:2311.11183·cs.RO·February 23, 2024·2 cites

Deploying and Evaluating LLMs to Program Service Mobile Robots

Zichao Hu, Francesca Lucchetti, Claire Schlesinger, Yash Saxena,, Anders Freeman, Sadanand Modak, Arjun Guha, Joydeep Biswas

PDF

Open Access 1 Repo

TL;DR

This paper introduces CodeBotler, an open-source tool for programming service mobile robots using LLMs, and RoboEval, a benchmark for evaluating LLMs' ability to generate correct robot programs, highlighting common failure modes.

Contribution

It presents a novel domain-specific language for robot programming, a new benchmark for evaluation, and an analysis of LLMs' failure modes in robot program generation.

Findings

01

LLMs can generate functional robot programs with few-shot prompting.

02

RoboEval effectively assesses program correctness through execution traces and temporal logic.

03

Common pitfalls in LLM-generated robot programs are identified and categorized.

Abstract

Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contribute CodeBotler, an open-source robot-agnostic tool to program service mobile robots from natural language, and RoboEval, a benchmark for evaluating LLMs' capabilities of generating programs to complete service robot tasks. CodeBotler performs program generation via few-shot prompting of LLMs with an embedded domain-specific language (eDSL) in Python, and leverages skill abstractions to deploy generated programs on any general-purpose mobile robot. RoboEval evaluates the correctness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ut-amrl/codebotler
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Natural Language Processing Techniques · Topic Modeling