CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation
Max Fu, Justin Yu, Karim El-Refai, Ethan Kou, Haoru Xue, Huang Huang, Wenli Xiao, Guanzhi Wang, Fei-Fei Li, Guanya Shi, Jiajun Wu, Shankar Sastry, Yuke Zhu, Ken Goldberg, Linxi "Jim" Fan

TL;DR
CaP-X introduces a comprehensive framework for benchmarking and enhancing coding agents in robot manipulation, demonstrating how structured code, scaling, and reinforcement learning improve robustness and transferability in embodied tasks.
Contribution
The paper presents CaP-X, an open-access platform for studying and improving Code-as-Policy agents, including new environments, benchmarks, and methods for robustness and sim2real transfer.
Findings
Performance improves with human-crafted abstractions
Scaling agentic computation enhances robustness
Reinforcement learning with verifiable rewards boosts success rates
Abstract
"Code-as-Policy" considers how executable code can complement data-intensive Vision-Language-Action (VLA) methods, yet their effectiveness as autonomous controllers for embodied manipulation remains underexplored. We present CaP-X, an open-access framework for systematically studying Code-as-Policy agents in robot manipulation. At its core is CaP-Gym, an interactive environment in which agents control robots by synthesizing and executing programs that compose perception and control primitives. Building on this foundation, CaP-Bench evaluates frontier language and vision-language models across varying levels of abstraction, interaction, and perceptual grounding. Across 12 models, CaP-Bench reveals a consistent trend: performance improves with human-crafted abstractions but degrades as these priors are removed, exposing a dependence on designer scaffolding. At the same time, we observe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Reinforcement Learning in Robotics
