Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets

Gagan Bansal; Wenyue Hua; Zezhou Huang; Adam Fourney; Amanda Swearngin; Will Epperson; Tyler Payne; Jake M. Hofman; Brendan Lucier; Chinmay Singh; Markus Mobius; Akshay Nambi; Archana Yadav; Kevin Gao; David M. Rothschild; Aleksandrs Slivkins; Daniel G. Goldstein; Hussein Mozannar; Nicole Immorlica; Maya Murad; Matthew Vogel; Subbarao Kambhampati; Eric Horvitz; and Saleema Amershi

arXiv:2510.25779·cs.MA·October 31, 2025

Magentic Marketplace: An Open-Source Environment for Studying Agentic Markets

Gagan Bansal, Wenyue Hua, Zezhou Huang, Adam Fourney, Amanda Swearngin, Will Epperson, Tyler Payne, Jake M. Hofman, Brendan Lucier, Chinmay Singh, Markus Mobius, Akshay Nambi, Archana Yadav, Kevin Gao, David M. Rothschild, Aleksandrs Slivkins, Daniel G. Goldstein

PDF

2 Videos 3 Reviews

TL;DR

This paper introduces Magentic-Marketplace, a simulated environment for studying agentic markets involving large language model agents, revealing insights into market dynamics, behaviors, and the impact of search mechanisms.

Contribution

It presents a novel open-source environment for analyzing complex agent interactions in realistic markets, addressing limitations of prior constrained studies.

Findings

01

Frontier models approach optimal welfare under ideal search conditions

02

Performance drops significantly as market scale increases

03

Severe first-proposal bias favors response speed over quality

Abstract

As LLM agents advance, they are increasingly mediating economic decisions, ranging from product discovery to transactions, on behalf of users. Such applications promise benefits but also raise many questions about agent accountability and value for users. Addressing these questions requires understanding how agents behave in realistic market conditions. However, previous research has largely evaluated agents in constrained settings, such as single-task marketplaces (e.g., negotiation) or structured two-agent interactions. Real-world markets are fundamentally different: they require agents to handle diverse economic activities and coordinate within large, dynamic ecosystems where multiple agents with opaque behaviors may engage in open-ended dialogues. To bridge this gap, we investigate two-sided agentic marketplaces where Assistant agents represent consumers and Service agents represent…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. An important and timely effort in building a simulated environment for agentic marketplace 2. The empirical findings have implications for model/agent builders and users.

Weaknesses

1. The model selection is a bit confusing. GPT 5 was used in one experiment but not others. Claude series models are not included at all. Adding more models would be helpful. 2. It would also be nice to see whether the model's capabilities would scale with the parameter sizes. 3. This paper misses several key references: https://arxiv.org/abs/2506.00073 https://arxiv.org/pdf/2509.01063

Reviewer 02Rating 4Confidence 3

Strengths

S1. The proposed system is designed for two-sided markets. S2. Biases in agent behavior and resistance to manipulation are investigated. S3. The scale of simulation is up to 100 consumers and 300 restaurants. S4. Multiple LLMs are tested in the experiments.

Weaknesses

W1. The presentation needs to be improved. First, the simulation design (Sec. 3) involves many high-level concepts, making it hard to understand. Second, the types of agents are confusing. For example, Figure 1 shows customer agents and business agents, while Figure 2 shows an assistant agent and a service agent. Third, based on the description of the proposed environment, it is hard to infer what is going to be evaluated in the experiments, obscuring the objectives of this study. W2. The simu

Reviewer 03Rating 4Confidence 4

Strengths

* Ambitious setup combining natural language, market dynamics, and agent reasoning. * Models the full market lifecycle (search to dialogue to transaction to evaluation), unlike prior simulations. * Clear motivation for testing emergent economic and ethical behaviors in LLMs.

Weaknesses

* The experiments are limited to a single, highly synthetic restaurant domain, which weakens claims of generality. * Results are mostly descriptive. There is little causal analysis or statistical depth. * The link between linguistic interaction and market efficiency remains underexplored. * No clear measure of whether agents reason economically or merely mimic patterns.

Videos

Magentic Marketplace: Testing societies of agents at scale· youtube

Microsoft Research Forum | Season 2, Episode 3· youtube