Adversarial Negotiation Dynamics in Generative Language Models
Arinbj\"orn Kolbeinsson, Benedikt Kolbeinsson

TL;DR
This paper investigates the adversarial interactions between generative language models in contract negotiation scenarios, revealing vulnerabilities and informing safer, more reliable model development for legal applications.
Contribution
It provides the first systematic evaluation of open-source language models' robustness in adversarial legal negotiations, highlighting vulnerabilities and safety concerns.
Findings
Models exhibit significant vulnerabilities in adversarial settings
Adversarial interactions can expose biases and safety issues
Insights inform strategies for developing more secure models
Abstract
Generative language models are increasingly used for contract drafting and enhancement, creating a scenario where competing parties deploy different language models against each other. This introduces not only a game-theory challenge but also significant concerns related to AI safety and security, as the language model employed by the opposing party can be unknown. These competitive interactions can be seen as adversarial testing grounds, where models are effectively red-teamed to expose vulnerabilities such as generating biased, harmful or legally problematic text. Despite the importance of these challenges, the competitive robustness and safety of these models in adversarial settings remain poorly understood. In this small study, we approach this problem by evaluating the performance and vulnerabilities of major open-source language models in head-to-head competitions, simulating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
