Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining

Robert M\"uller; Clemens M\"uller

arXiv:2605.14537·cs.AI·May 15, 2026

Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining

Robert M\"uller, Clemens M\"uller

PDF

TL;DR

Cattle Trade is a comprehensive multi-agent benchmark designed to evaluate large language models' strategic reasoning, bluffing, and bargaining skills in complex economic game scenarios with imperfect information.

Contribution

It introduces a novel benchmark combining multiple strategic tasks in a long-horizon game to assess integrated capabilities of LLMs in multi-agent economic environments.

Findings

01

Heuristic code agents outperform most LLMs in the benchmark.

02

Behavioral analysis reveals common LLM failure modes like overbidding and weak opponent adaptation.

03

Strategic coherence correlates more strongly with rank than volume or individual skills.

Abstract

We introduce \textsc{Cattle Trade, a multi-agent benchmark for evaluating large language models (LLMs) as agents in strategic reasoning under imperfect information, adversarial interaction, and resource constraints. The benchmark combines auctions, hidden-offer trade challenges (TCs), bargaining, bluffing, opponent modeling, and resource allocation within a single long-horizon game lasting 50--60 turns. Unlike prior agent benchmarks that test these abilities in isolation, \textsc{Cattle Trade} evaluates whether agents integrate them across a competitive, multi-agent economic game with conflicting incentives. The benchmark logs every bid, TC offer, counteroffer, and card selection, enabling behavioural analysis beyond final scores or win rates. We evaluate seven cost-efficient language models and three deterministic code agents across 242 games. Strategic coherence, in particular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.