EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots

Zixing Lei; Genjia Liu; Yuanshuo Zhang; Qipeng Liu; Chuan Wen; Shanghang Zhang; Wenzhao Lian; Siheng Chen

arXiv:2601.21570·cs.AI·January 30, 2026

EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots

Zixing Lei, Genjia Liu, Yuanshuo Zhang, Qipeng Liu, Chuan Wen, Shanghang Zhang, Wenzhao Lian, Siheng Chen

PDF

Open Access

TL;DR

This paper introduces EmboCoach-Bench, a benchmark for evaluating autonomous AI agents in developing embodied robotic policies through iterative, feedback-driven workflows, significantly advancing scalable, self-evolving embodied intelligence.

Contribution

It presents a novel benchmark that assesses LLM-based agents' ability to autonomously engineer embodied policies using a dynamic, feedback loop approach across diverse tasks.

Findings

01

Autonomous agents outperform human baselines by 26.5% success rate.

02

Environment feedback enhances policy development and reduces performance gaps.

03

Agents can self-correct and recover from near-failure cases through iterative debugging.

Abstract

The field of Embodied AI is witnessing a rapid evolution toward general-purpose robotic systems, fueled by high-fidelity simulation and large-scale data collection. However, this scaling capability remains severely bottlenecked by a reliance on labor-intensive manual oversight from intricate reward shaping to hyperparameter tuning across heterogeneous backends. Inspired by LLMs' success in software automation and science discovery, we introduce \textsc{EmboCoach-Bench}, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies. Spanning 32 expert-curated RL and IL tasks, our framework posits executable code as the universal interface. We move beyond static generation to assess a dynamic closed-loop workflow, where agents leverage environment feedback to iteratively draft, debug, and optimize solutions, spanning improvements from physics-informed reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Software Engineering Methodologies · Advanced Malware Detection Techniques