Cattle Trade: A Multi-Agent Benchmark for LLM Bluffing, Bidding, and Bargaining
Robert M\"uller, Clemens M\"uller

TL;DR
Cattle Trade is a comprehensive multi-agent benchmark designed to evaluate large language models' strategic reasoning, bluffing, and bargaining skills in complex economic game scenarios with imperfect information.
Contribution
It introduces a novel benchmark combining multiple strategic tasks in a long-horizon game to assess integrated capabilities of LLMs in multi-agent economic environments.
Findings
Heuristic code agents outperform most LLMs in the benchmark.
Behavioral analysis reveals common LLM failure modes like overbidding and weak opponent adaptation.
Strategic coherence correlates more strongly with rank than volume or individual skills.
Abstract
We introduce \textsc{Cattle Trade, a multi-agent benchmark for evaluating large language models (LLMs) as agents in strategic reasoning under imperfect information, adversarial interaction, and resource constraints. The benchmark combines auctions, hidden-offer trade challenges (TCs), bargaining, bluffing, opponent modeling, and resource allocation within a single long-horizon game lasting 50--60 turns. Unlike prior agent benchmarks that test these abilities in isolation, \textsc{Cattle Trade} evaluates whether agents integrate them across a competitive, multi-agent economic game with conflicting incentives. The benchmark logs every bid, TC offer, counteroffer, and card selection, enabling behavioural analysis beyond final scores or win rates. We evaluate seven cost-efficient language models and three deterministic code agents across 242 games. Strategic coherence, in particular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
