EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots
Zixing Lei, Genjia Liu, Yuanshuo Zhang, Qipeng Liu, Chuan Wen, Shanghang Zhang, Wenzhao Lian, Siheng Chen

TL;DR
This paper introduces EmboCoach-Bench, a benchmark for evaluating autonomous AI agents in developing embodied robotic policies through iterative, feedback-driven workflows, significantly advancing scalable, self-evolving embodied intelligence.
Contribution
It presents a novel benchmark that assesses LLM-based agents' ability to autonomously engineer embodied policies using a dynamic, feedback loop approach across diverse tasks.
Findings
Autonomous agents outperform human baselines by 26.5% success rate.
Environment feedback enhances policy development and reduces performance gaps.
Agents can self-correct and recover from near-failure cases through iterative debugging.
Abstract
The field of Embodied AI is witnessing a rapid evolution toward general-purpose robotic systems, fueled by high-fidelity simulation and large-scale data collection. However, this scaling capability remains severely bottlenecked by a reliance on labor-intensive manual oversight from intricate reward shaping to hyperparameter tuning across heterogeneous backends. Inspired by LLMs' success in software automation and science discovery, we introduce \textsc{EmboCoach-Bench}, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies. Spanning 32 expert-curated RL and IL tasks, our framework posits executable code as the universal interface. We move beyond static generation to assess a dynamic closed-loop workflow, where agents leverage environment feedback to iteratively draft, debug, and optimize solutions, spanning improvements from physics-informed reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Software Engineering Methodologies · Advanced Malware Detection Techniques
