The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
Haileleol Tibebu

TL;DR
This paper proves that for highly autonomous human-AI collectives with feedback cycles, no accountability framework can fully assign responsibility, revealing a fundamental limit in AI governance.
Contribution
It introduces a formal model of human-AI collectives and establishes an impossibility theorem showing the limits of accountability frameworks at high autonomy levels.
Findings
Accountability frameworks fail when collective autonomy exceeds the horizon.
A phase transition exists where accountability becomes impossible.
Experiments with synthetic collectives confirm the theoretical predictions.
Abstract
Existing accountability frameworks for AI systems, legal, ethical, and regulatory, rest on a shared assumption: for any consequential outcome, at least one identifiable person had enough involvement and foresight to bear meaningful responsibility. This paper proves that agentic AI systems violate this assumption not as an engineering limitation but as a mathematical necessity once autonomy exceeds a computable threshold. We introduce Human-Agent Collectives, a formalisation of joint human-AI systems where agents are modelled as state-policy tuples within a shared structural causal model. Autonomy is characterised through a four-dimensional information-theoretic profile (epistemic, executive, evaluative, social); collective behaviour through interaction graphs and joint action spaces. We axiomatise legitimate accountability through four minimal properties: Attributability (responsibility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
