A Decision-Theoretic Approach for Managing Misalignment
Daniel A. Herrmann, Abinav Chari, Isabelle Qian, Sree Sharvesh, B. A. Levinstein

TL;DR
This paper presents a formal decision-theoretic framework to determine when delegating decisions to AI systems is justified under uncertainty, balancing alignment, accuracy, and reach.
Contribution
It introduces a novel formal model for evaluating AI delegation decisions, emphasizing context-specific delegation over universal trust, and provides a scoring method for practical assessment.
Findings
Universal delegation requires near-perfect alignment and trust.
Context-specific delegation can be optimal despite misalignment.
A new scoring framework quantifies decision tradeoffs.
Abstract
When should we delegate decisions to AI systems? While the value alignment literature has developed techniques for shaping AI values, less attention has been paid to how to determine, under uncertainty, when imperfect alignment is good enough to justify delegation. We argue that rational delegation requires balancing an agent's value (mis)alignment with its epistemic accuracy and its reach (the acts it has available). This paper introduces a formal, decision-theoretic framework to analyze this tradeoff precisely accounting for a principal's uncertainty about these factors. Our analysis reveals a sharp distinction between two delegation scenarios. First, universal delegation (trusting an agent with any problem) demands near-perfect value alignment and total epistemic trust, conditions rarely met in practice. Second, we show that context-specific delegation can be optimal even with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI) · Scientific Computing and Data Management
