EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems
Xue Qin, Simin Luan, John See, Cong Yang, and Zhijun Li

TL;DR
EmbodiedGovBench introduces a comprehensive benchmark to evaluate governance, safety, and control aspects of embodied AI systems beyond task success metrics, emphasizing controllability, recoverability, and policy adherence.
Contribution
The paper presents EmbodiedGovBench, a novel benchmark framework for assessing governance and safety in embodied AI systems across multiple dimensions.
Findings
Benchmark covers seven governance dimensions including safety and control.
Provides scenario templates, perturbation operators, and evaluation protocols.
Supports instantiation over embodied capability runtimes with modular interfaces.
Abstract
Recent progress in embodied AI has produced a growing ecosystem of robot policies, foundation models, and modular runtimes. However, current evaluation remains dominated by task success metrics such as completion rate or manipulation accuracy. These metrics leave a critical gap: they do not measure whether embodied systems are governable -- whether they respect capability boundaries, enforce policies, recover safely, maintain audit trails, and respond to human oversight. We present EmbodiedGovBench, a benchmark for governance-oriented evaluation of embodied agent systems. Rather than asking only whether a robot can complete a task, EmbodiedGovBench evaluates whether the system remains controllable, policy-bounded, recoverable, auditable, and evolution-safe under realistic perturbations. The benchmark covers seven governance dimensions: unauthorized capability invocation, runtime drift…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
