Deploying and Evaluating LLMs to Program Service Mobile Robots
Zichao Hu, Francesca Lucchetti, Claire Schlesinger, Yash Saxena,, Anders Freeman, Sadanand Modak, Arjun Guha, Joydeep Biswas

TL;DR
This paper introduces CodeBotler, an open-source tool for programming service mobile robots using LLMs, and RoboEval, a benchmark for evaluating LLMs' ability to generate correct robot programs, highlighting common failure modes.
Contribution
It presents a novel domain-specific language for robot programming, a new benchmark for evaluation, and an analysis of LLMs' failure modes in robot program generation.
Findings
LLMs can generate functional robot programs with few-shot prompting.
RoboEval effectively assesses program correctness through execution traces and temporal logic.
Common pitfalls in LLM-generated robot programs are identified and categorized.
Abstract
Recent advancements in large language models (LLMs) have spurred interest in using them for generating robot programs from natural language, with promising initial results. We investigate the use of LLMs to generate programs for service mobile robots leveraging mobility, perception, and human interaction skills, and where accurate sequencing and ordering of actions is crucial for success. We contribute CodeBotler, an open-source robot-agnostic tool to program service mobile robots from natural language, and RoboEval, a benchmark for evaluating LLMs' capabilities of generating programs to complete service robot tasks. CodeBotler performs program generation via few-shot prompting of LLMs with an embedded domain-specific language (eDSL) in Python, and leverages skill abstractions to deploy generated programs on any general-purpose mobile robot. RoboEval evaluates the correctness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Natural Language Processing Techniques · Topic Modeling
