TL;DR
This paper introduces a comprehensive framework for evaluating agentic AI trustworthiness across socio-technical scenarios, addressing current fragmentation and operationalizing trustworthiness properties.
Contribution
It defines a five-property trustworthiness profile and proposes the HAAF framework for scenario-based assessment and intervention, enabling generalizable improvements across diverse AI systems.
Findings
All 13 tested systems improved on the trustworthiness profile.
Two systems achieved a perfect risk-weighted profile.
The framework generalizes interventions without per-model tuning.
Abstract
Agentic AI systems increasingly act through tool-augmented, multi-step workflows whose failures (unsafe tool use, unauthorised actions, social harm) carry deployment-level consequences. Evaluation practice remains fragmented across isolated benchmark slices, and "trustworthiness" is frequently invoked but rarely defined operationally. We argue the central limitation is twofold: (i) the absence of a measurable specification of what agent trustworthiness means, and (ii) the lack of a principled notion of representativeness allowing assessment over a socio-technical scenario distribution rather than disconnected benchmark instances. We address (i) by defining agentic trustworthiness as a five-property profile (Reliability, Robustness, Safety, Social-Ethical Alignment, Operational Integrity) grounded in current AI risk frameworks, and (ii) with the Holographic Agent Assessment Framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education
