Evaluation of Agents under Simulated AI Marketplace Dynamics

To Eun Kim; Alireza Salemi; Hamed Zamani; Fernando Diaz

arXiv:2604.14256·cs.IR·April 17, 2026

Evaluation of Agents under Simulated AI Marketplace Dynamics

To Eun Kim, Alireza Salemi, Hamed Zamani, Fernando Diaz

PDF

TL;DR

This paper introduces a simulation-based evaluation framework for AI systems operating in competitive marketplaces, capturing dynamics like user switching and market share beyond static benchmarks.

Contribution

It presents Marketplace Evaluation, a novel simulation paradigm that assesses AI systems in competitive, dynamic environments, extending traditional static evaluation methods.

Findings

01

Enables longitudinal assessment of AI systems in simulated marketplaces.

02

Provides marketplace-level metrics such as retention and market share.

03

Formalizes a research agenda for marketplace simulation and evaluation.

Abstract

Modern information access ecosystems consist of mixtures of systems, such as retrieval systems and large language models, and increasingly rely on marketplaces to mediate access to models, tools, and data, making competition between systems inherent to deployment. In such settings, outcomes are shaped not only by benchmark quality but also by competitive pressure, including user switching, routing decisions, and operational constraints. Yet evaluation is still largely conducted on static benchmarks with accuracy-focused measures that assume systems operate in isolation. This mismatch makes it difficult to predict post-deployment success and obscures competitive effects such as early-adoption advantages and market dominance. We introduce Marketplace Evaluation, a simulation-based paradigm that evaluates information access systems as participants in a competitive marketplace. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.