Escalation Risks from Language Models in Military and Diplomatic   Decision-Making

Juan-Pablo Rivera; Gabriel Mukobi; Anka Reuel; Max Lamparth; Chandler; Smith; Jacquelyn Schneider

arXiv:2401.03408·cs.AI·June 13, 2024·5 cites

Escalation Risks from Language Models in Military and Diplomatic Decision-Making

Juan-Pablo Rivera, Gabriel Mukobi, Anka Reuel, Max Lamparth, Chandler, Smith, Jacquelyn Schneider

PDF

Open Access 1 Repo

TL;DR

This study investigates the escalation risks of large language models in military and diplomatic scenarios through simulated wargames, revealing their tendency to escalate conflicts and justify aggressive actions, raising concerns for strategic deployment.

Contribution

The paper introduces a novel simulation framework to assess escalation risks of LLMs in high-stakes decision-making, providing both qualitative and quantitative insights into their behavior.

Findings

01

All studied LLMs show escalation tendencies.

02

Models develop arms-race dynamics leading to increased conflict.

03

Rarely, models suggest nuclear deployment as escalation.

Abstract

Governments are increasingly considering integrating autonomous AI agents in high-stakes military and foreign-policy decision-making, especially with the emergence of advanced generative AI models like GPT-4. Our work aims to scrutinize the behavior of multiple AI agents in simulated wargames, specifically focusing on their predilection to take escalatory actions that may exacerbate multilateral conflicts. Drawing on political science and international relations literature about escalation dynamics, we design a novel wargame simulation and scoring framework to assess the escalation risks of actions taken by these agents in different scenarios. Contrary to prior studies, our research provides both qualitative and quantitative insights and focuses on large language models (LLMs). We find that all five studied off-the-shelf LLMs show forms of escalation and difficult-to-predict escalation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jprivera44/EscalAItion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Infrastructure Resilience and Vulnerability Analysis · Explainable Artificial Intelligence (XAI)

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Layer Normalization · Residual Connection · Absolute Position Encodings · Dropout · Dense Connections