Escalation Risks from Language Models in Military and Diplomatic Decision-Making
Juan-Pablo Rivera, Gabriel Mukobi, Anka Reuel, Max Lamparth, Chandler, Smith, Jacquelyn Schneider

TL;DR
This study investigates the escalation risks of large language models in military and diplomatic scenarios through simulated wargames, revealing their tendency to escalate conflicts and justify aggressive actions, raising concerns for strategic deployment.
Contribution
The paper introduces a novel simulation framework to assess escalation risks of LLMs in high-stakes decision-making, providing both qualitative and quantitative insights into their behavior.
Findings
All studied LLMs show escalation tendencies.
Models develop arms-race dynamics leading to increased conflict.
Rarely, models suggest nuclear deployment as escalation.
Abstract
Governments are increasingly considering integrating autonomous AI agents in high-stakes military and foreign-policy decision-making, especially with the emergence of advanced generative AI models like GPT-4. Our work aims to scrutinize the behavior of multiple AI agents in simulated wargames, specifically focusing on their predilection to take escalatory actions that may exacerbate multilateral conflicts. Drawing on political science and international relations literature about escalation dynamics, we design a novel wargame simulation and scoring framework to assess the escalation risks of actions taken by these agents in different scenarios. Contrary to prior studies, our research provides both qualitative and quantitative insights and focuses on large language models (LLMs). We find that all five studied off-the-shelf LLMs show forms of escalation and difficult-to-predict escalation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Infrastructure Resilience and Vulnerability Analysis · Explainable Artificial Intelligence (XAI)
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Layer Normalization · Residual Connection · Absolute Position Encodings · Dropout · Dense Connections
