REBAR: Reference Ethical Benchmark for Autonomy Readiness

Jonathan Diller; David Barnes; Rebekah Bogdanoff; Rhett Collier; Roddy Collins; Keith Fieldhouse; Yonatan Gefen; Cameron Johnson; Anuriha Kodali; Brad Kriel; Varun Murali; James Niehaus; Mish Sukharev; Joseph VanPelt; Anthony Hoogs; Vijay Kumar; Arslan Basharat

arXiv:2605.18423·cs.RO·May 19, 2026

REBAR: Reference Ethical Benchmark for Autonomy Readiness

Jonathan Diller, David Barnes, Rebekah Bogdanoff, Rhett Collier, Roddy Collins, Keith Fieldhouse, Yonatan Gefen, Cameron Johnson, Anuriha Kodali, Brad Kriel, Varun Murali, James Niehaus, Mish Sukharev, Joseph VanPelt, Anthony Hoogs, Vijay Kumar, Arslan Basharat

PDF

TL;DR

REBAR is a quantitative framework that evaluates autonomous systems' ethical readiness using a computable metric, innovative LLM techniques, and a photorealistic simulation environment.

Contribution

The paper introduces REBAR, a novel benchmark that quantifies ethical performance of autonomous systems with LLM-driven scenario generation and explainability.

Findings

01

REBAR provides an objective, repeatable ethical benchmark score.

02

It uses LLMs to assess and explain ethical difficulty of scenarios.

03

The framework bridges the gap between ethical principles and verifiable autonomy.

Abstract

As autonomous systems grow more advanced, objective metrics to evaluate their ethical and legal compliance are critical for informing end users of their limitations and ensuring accountability of those who misuse them. Current ethical embodied AI frameworks remain mostly qualitative, focusing on system design (through safety guardrails or targeted red teaming), and the realized guardrails often directly disallow unsafe behavior without providing the user with an override or interpretable reason. Instead, there is a need for computable metrics through rigorous testing that allow a user to determine the applicability of the system to the task. To address this gap, we introduce the Reference Ethical Benchmark for Autonomy Readiness (REBAR), a quantitative test and evaluation framework for autonomous systems. REBAR maps operating metrics into a computable Autonomy Readiness Level (ARL)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.