Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and   Execution of LLM Agents in an Auction Arena

Jiangjie Chen; Siyu Yuan; Rong Ye; Bodhisattwa Prasad Majumder; Kyle; Richardson

arXiv:2310.05746·cs.CL·August 27, 2024·5 cites

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle, Richardson

PDF

Open Access 1 Repo

TL;DR

This paper introduces AucArena, an auction simulation environment to evaluate LLMs' strategic reasoning and planning skills, revealing their strengths and limitations in dynamic, competitive scenarios.

Contribution

The paper presents AucArena, a novel auction-based evaluation suite for testing LLMs' strategic and planning abilities in complex, unpredictable environments.

Findings

01

LLMs like GPT-4 can manage budgets and goals in auctions

02

Adaptive strategies improve LLM performance in auction tasks

03

Simple methods can sometimes outperform complex LLM strategies

Abstract

Recent advancements in Large Language Models (LLMs) showcase advanced reasoning, yet NLP evaluations often depend on static benchmarks. Evaluating this necessitates environments that test strategic reasoning in dynamic, competitive scenarios requiring long-term planning. We introduce AucArena, a novel evaluation suite that simulates auctions, a setting chosen for being highly unpredictable and involving many skills related to resource and risk management, while also being easy to evaluate. We conduct controlled experiments using state-of-the-art LLMs to power bidding agents to benchmark their planning and execution skills. Our research demonstrates that LLMs, such as GPT-4, possess key skills for auction participation, such as budget management and goal adherence, which improve with adaptive strategies. This highlights LLMs' potential in modeling complex social interactions in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiangjiechen/auction-arena
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multi-Agent Systems and Negotiation

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax · Dense Connections · Label Smoothing