EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems

Xue Qin; Simin Luan; John See; Cong Yang; and Zhijun Li

arXiv:2604.11174·cs.RO·April 14, 2026

EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems

Xue Qin, Simin Luan, John See, Cong Yang, and Zhijun Li

PDF

TL;DR

EmbodiedGovBench introduces a comprehensive benchmark to evaluate governance, safety, and control aspects of embodied AI systems beyond task success metrics, emphasizing controllability, recoverability, and policy adherence.

Contribution

The paper presents EmbodiedGovBench, a novel benchmark framework for assessing governance and safety in embodied AI systems across multiple dimensions.

Findings

01

Benchmark covers seven governance dimensions including safety and control.

02

Provides scenario templates, perturbation operators, and evaluation protocols.

03

Supports instantiation over embodied capability runtimes with modular interfaces.

Abstract

Recent progress in embodied AI has produced a growing ecosystem of robot policies, foundation models, and modular runtimes. However, current evaluation remains dominated by task success metrics such as completion rate or manipulation accuracy. These metrics leave a critical gap: they do not measure whether embodied systems are governable -- whether they respect capability boundaries, enforce policies, recover safely, maintain audit trails, and respond to human oversight. We present EmbodiedGovBench, a benchmark for governance-oriented evaluation of embodied agent systems. Rather than asking only whether a robot can complete a task, EmbodiedGovBench evaluates whether the system remains controllable, policy-bounded, recoverable, auditable, and evolution-safe under realistic perturbations. The benchmark covers seven governance dimensions: unauthorized capability invocation, runtime drift…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.