Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence
David Manheim

TL;DR
This paper explores complex failure modes in multi-agent AI systems, highlighting their causes, examples, and the gaps in current research, emphasizing the need for better understanding and mitigation strategies.
Contribution
It categorizes and defines new multi-agent failure modes, illustrating their prevalence and discussing shortcomings in existing literature regarding their mitigation.
Findings
Multi-agent failure modes are more complex and problematic than single-agent failures.
Examples from poker AI demonstrate real-world occurrences of these failure modes.
Current research inadequately addresses these multi-agent failure modes.
Abstract
An important challenge for safety in machine learning and artificial intelligence systems is a~set of related failures involving specification gaming, reward hacking, fragility to distributional shifts, and Goodhart's or Campbell's law. This paper presents additional failure modes for interactions within multi-agent systems that are closely related. These multi-agent failure modes are more complex, more problematic, and less well understood than the single-agent case, and are also already occurring, largely unnoticed. After motivating the discussion with examples from poker-playing artificial intelligence (AI), the paper explains why these failure modes are in some senses unavoidable. Following this, the paper categorizes failure modes, provides definitions, and cites examples for each of the modes: accidental steering, coordination failures, adversarial misalignment, input spoofing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
