Economics Arena for Large Language Models
Shangmin Guo, Haoran Bu, Haochuan Wang, Yi Ren, Dianbo Sui, Yuming, Shang, Siting Lu

TL;DR
This paper introduces an economics-based dynamic game environment to evaluate large language models' rationality, strategic reasoning, and rule-following abilities, revealing differences among models like GPT-4 in convergence and performance.
Contribution
It proposes a novel dynamic evaluation framework using competitive economic games to assess LLMs' strategic and rational capabilities beyond static benchmarks.
Findings
Most LLMs exhibit rational strategies that increase payoffs.
GPT-4 converges faster to Nash Equilibria than other models.
Winning rates correlate with reasoning ability and rule-following skills.
Abstract
Large language models (LLMs) have been extensively used as the backbones for general-purpose agents, and some economics literature suggest that LLMs are capable of playing various types of economics games. Following these works, to overcome the limitation of evaluating LLMs using static benchmarks, we propose to explore competitive games as an evaluation for LLMs to incorporate multi-players and dynamicise the environment. By varying the game history revealed to LLMs-based players, we find that most of LLMs are rational in that they play strategies that can increase their payoffs, but not as rational as indicated by Nash Equilibria (NEs). Moreover, when game history are available, certain types of LLMs, such as GPT-4, can converge faster to the NE strategies, which suggests higher rationality level in comparison to other models. In the meantime, certain types of LLMs can win more often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
