Managing Escalation in Off-the-Shelf Large Language Models
Sebastian Elbaum, Jonathan Panter

TL;DR
This paper demonstrates simple interventions to control escalation tendencies in off-the-shelf large language models, showing they can be effectively managed for national security applications rather than avoided.
Contribution
It introduces non-technical interventions that significantly reduce escalation in large language models within strategic scenarios, supporting their safe use in national security.
Findings
Interventions reduce escalation in LLMs during wargame simulations
LLMs can be aligned with security goals through simple measures
Calls for banning LLMs in security are premature
Abstract
U.S. national security customers have begun to utilize large language models, including enterprise versions of ``off-the-shelf'' models (e.g., ChatGPT) familiar to the public. This uptake will likely accelerate. However, recent studies suggest that off-the-shelf large language models frequently suggest escalatory actions when prompted with geopolitical or strategic scenarios. We demonstrate two simple, non-technical interventions to control these tendencies. Introducing these interventions into the experimental wargame design of a recent study, we substantially reduce escalation throughout the game. Calls to restrict the use of large language models in national security applications are thus premature. The U.S. government is already, and will continue, employing large language models for scenario planning and suggesting courses of action. Rather than warning against such applications,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
