Beyond single-channel agentic benchmarking
Nelu D. Radpour

TL;DR
This paper challenges the traditional single-channel AI safety benchmarks by proposing a human-AI dyad approach, emphasizing redundancy and error diversity to better reflect real-world safety in safety-critical environments.
Contribution
It introduces a new safety evaluation framework that considers AI systems as part of a redundant, diverse safety layer alongside humans, aligning AI safety assessment with established engineering principles.
Findings
AI systems can serve as effective redundant safety layers.
Error diversity reduces overall risk in human-AI systems.
Reframing safety evaluation improves ecological validity.
Abstract
Contemporary benchmarks for agentic artificial intelligence (AI) frequently evaluate safety through isolated task-level accuracy thresholds, implicitly treating autonomous systems as single points of failure. This single-channel paradigm diverges from established principles in safety-critical engineering, where risk mitigation is achieved through redundancy, diversity of error modes, and joint system reliability. This paper argues that evaluating AI agents in isolation systematically mischaracterizes their operational safety when deployed within human-in-the-loop environments. Using a recent laboratory safety benchmark as a case study demonstrates that even imperfect AI systems can nonetheless provide substantial safety utility by functioning as redundant audit layers against well-documented sources of human failure, including vigilance decrement, inattentional blindness, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Human-Automation Interaction and Safety · Explainable Artificial Intelligence (XAI)
