Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

Shuze Daniel Liu; Claire Chen; Jiabao Sean Xiao; Lei Lei; Yuheng Zhang; Yisong Yue; David Simchi-Levi

arXiv:2604.09855·cs.AI·April 14, 2026

Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

Shuze Daniel Liu, Claire Chen, Jiabao Sean Xiao, Lei Lei, Yuheng Zhang, Yisong Yue, David Simchi-Levi

PDF

1 Models

TL;DR

This paper demonstrates that reinforcement learning with verifiable rewards enables mid-sized language models to develop advanced negotiation skills, outperforming larger models and generalizing across different scenarios.

Contribution

Introduces a novel RLVR framework for training LLMs in negotiation, leading to strategic evolution and superior performance over larger models.

Findings

01

A 30B agent outperforms models over ten times its size in surplus extraction.

02

The trained agent generalizes well to unseen and adversarial seller personas.

03

The strategic development includes phases from naive bargaining to sophisticated persuasion.

Abstract

The recent advancement of Large Language Models (LLMs) has established their potential as autonomous interactive agents. However, they often struggle in strategic games of incomplete information, such as bilateral price negotiation. In this paper, we investigate if Reinforcement Learning from Verifiable Rewards (RLVR) can effectively teach LLMs to negotiate. Specifically, we explore the strategic behaviors that emerge during the learning process. We introduce a framework that trains a mid-sized buyer agent against a regulated LLM seller across a wide distribution of real-world products. By grounding reward signals directly in the maximization of economic surplus and strict adherence to private budget constraints, we reveal a novel four-phase strategic evolution. The agent progresses from naive bargaining to using aggressive starting prices, moves through a phase of deadlock, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ZeterMordio/anchor-negotiation-sdpo-qwen35-smoke
model· 255 dl
255 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.