Measuring Error Alignment for Decision-Making Systems
Binxia Xu, Antonis Bikakis, Daniel Onah, Andreas Vlachidis, Luke, Dickens

TL;DR
This paper introduces two new behavioral metrics to assess how well AI systems align with human decision-making, aiming to improve trustworthiness without costly internal state analysis.
Contribution
It proposes novel, cost-effective behavioral alignment metrics that correlate with internal representational similarity, advancing methods for AI value alignment.
Findings
Metrics correlate with representational alignment measures
Metrics are effective across multiple domains
Provide a new approach for value alignment
Abstract
Given that AI systems are set to play a pivotal role in future decision-making processes, their trustworthiness and reliability are of critical concern. Due to their scale and complexity, modern AI systems resist direct interpretation, and alternative ways are needed to establish trust in those systems, and determine how well they align with human values. We argue that good measures of the information processing similarities between AI and humans, may be able to achieve these same ends. While Representational alignment (RA) approaches measure similarity between the internal states of two systems, the associated data can be expensive and difficult to collect for human systems. In contrast, Behavioural alignment (BA) comparisons are cheaper and easier, but questions remain as to their sensitivity and reliability. We propose two new behavioural alignment metrics misclassification agreement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBusiness Process Modeling and Analysis · Complex Systems and Decision Making
MethodsSparse Evolutionary Training · ALIGN
