Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

Yushuo Zheng (1; 2); Huiyu Duan (1); Zicheng Zhang (1; 2); Yucheng Zhu (1); Xiongkuo Min (1); Guangtao Zhai (1; 2) ((1) Shanghai Jiao Tong University; (2) Shanghai Artificial Intelligence Laboratory)

arXiv:2604.05523·cs.AI·April 21, 2026

Market-Bench: Benchmarking Large Language Models on Economic and Trade Competition

Yushuo Zheng (1, 2), Huiyu Duan (1), Zicheng Zhang (1, 2), Yucheng Zhu (1), Xiongkuo Min (1), Guangtao Zhai (1, 2) ((1) Shanghai Jiao Tong University, (2) Shanghai Artificial Intelligence Laboratory)

PDF

TL;DR

Market-Bench introduces a comprehensive benchmark to evaluate large language models' capabilities in economic and trade tasks through a simulated multi-agent supply chain environment, revealing performance disparities among models.

Contribution

This paper presents the first benchmark for assessing LLMs in economic and trade competition scenarios using a configurable multi-agent supply chain model.

Findings

01

Significant performance disparities among LLM agents in economic tasks.

02

A small subset of LLM retailers achieve consistent capital appreciation.

03

Many LLMs hover around break-even despite similar semantic scores.

Abstract

The ability of large language models (LLMs) to manage and acquire economic resources remains unclear. In this paper, we introduce \textbf{Market-Bench}, a comprehensive benchmark that evaluates the capabilities of LLMs in economically-relevant tasks through economic and trade competition. Specifically, we construct a configurable multi-agent supply chain economic model where LLMs act as retailer agents responsible for procuring and retailing merchandise. In the \textbf{procurement} stage, LLMs bid for limited inventory in budget-constrained auctions. In the \textbf{retail} stage, LLMs set retail prices, generate marketing slogans, and provide them to buyers through a role-based attention mechanism for purchase. Market-Bench logs complete trajectories of bids, prices, slogans, sales, and balance-sheet states, enabling automatic evaluation with economic, operational, and semantic metrics.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.