Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition
Yushuo Zheng (1, 2), Huiyu Duan (1), Zicheng Zhang (1, 2), Yucheng Zhu (1), Xiongkuo Min (1), Guangtao Zhai (1, 2) ((1) Shanghai Jiao Tong University, (2) Shanghai Artificial Intelligence Laboratory)

TL;DR
Market-Bench introduces a comprehensive benchmark to evaluate large language models' capabilities in economic and trade tasks through a simulated multi-agent supply chain environment, revealing performance disparities among models.
Contribution
This paper presents the first benchmark for assessing LLMs in economic and trade competition scenarios using a configurable multi-agent supply chain model.
Findings
Significant performance disparities among LLM agents in economic tasks.
A small subset of LLM retailers achieve consistent capital appreciation.
Many LLMs hover around break-even despite similar semantic scores.
Abstract
The ability of large language models (LLMs) to manage and acquire economic resources remains unclear. In this paper, we introduce \textbf{Market-Bench}, a comprehensive benchmark that evaluates the capabilities of LLMs in economically-relevant tasks through economic and trade competition. Specifically, we construct a configurable multi-agent supply chain economic model where LLMs act as retailer agents responsible for procuring and retailing merchandise. In the \textbf{procurement} stage, LLMs bid for limited inventory in budget-constrained auctions. In the \textbf{retail} stage, LLMs set retail prices, generate marketing slogans, and provide them to buyers through a role-based attention mechanism for purchase. Market-Bench logs complete trajectories of bids, prices, slogans, sales, and balance-sheet states, enabling automatic evaluation with economic, operational, and semantic metrics.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
