Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment

Aidan Kierans; Avijit Ghosh; Hananel Hazan; Shiri Dori-Hacohen

arXiv:2406.04231·cs.MA·June 3, 2025·1 cites

Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment

Aidan Kierans, Avijit Ghosh, Hananel Hazan, Shiri Dori-Hacohen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a computational social science model to quantify and analyze complex misalignment among diverse human and AI agents, addressing a gap in understanding sociotechnical alignment issues.

Contribution

It adapts a social science model to measure misalignment in multi-agent settings, providing a practical tool for analyzing complex sociotechnical environments.

Findings

01

Model captures intuitive misalignment dynamics across scenarios

02

Misalignment scores depend on agent population and conflicting preferences

03

Application to autonomous vehicle case study demonstrates utility

Abstract

Existing work on the alignment problem has focused mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on a single agent or on humanity as a monolith. Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents. We address this gap by adapting a computational social science model of human contention to the alignment problem. Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals across various problem areas. Misalignment scores in our framework depend on the observed agent population, the domain in question, and conflict between agents' weighted preferences. Through simulations, we demonstrate how our model captures intuitive aspects of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

riet-lab/quantifying-misalignment
noneOfficial

Videos

Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment· underline

Taxonomy

TopicsMulti-Agent Systems and Negotiation

MethodsALIGN