[Re] Benchmarking LLM Capabilities in Negotiation through Scoreable Games
Jorge Carrasco Pollo, Ioannis Kapetangeorgis, Joshua Rosenthal, John Hua Yao

TL;DR
This paper critically evaluates a complex negotiation benchmark for Large Language Models, replicating experiments, introducing new metrics, and analyzing model behaviors to assess its robustness, objectivity, and usability in multi-agent negotiation tasks.
Contribution
It reproduces and extends the original benchmark, identifies limitations, and provides insights into its applicability and the importance of context in model evaluation.
Findings
Benchmark is complex but comparison is ambiguous
Limitations in information leakage detection
Context significantly impacts model evaluation
Abstract
Large Language Models (LLMs) demonstrate significant potential in multi-agent negotiation tasks, yet evaluation in this domain remains challenging due to a lack of robust and generalizable benchmarks. Abdelnabi et al. (2024) introduce a negotiation benchmark based on Scoreable Games, with the aim of developing a highly complex and realistic evaluation framework for LLMs. Our work investigates the reproducibility of claims in their benchmark, and provides a deeper understanding of its usability and generalizability. We replicate the original experiments on additional models, and introduce additional metrics to verify negotiation quality and evenness of evaluation. Our findings reveal that while the benchmark is indeed complex, model comparison is ambiguous, raising questions about its objectivity. Furthermore, we identify limitations in the experimental setup, particularly in information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multi-Agent Systems and Negotiation · Artificial Intelligence in Healthcare and Education
