Strategic Self-Improvement for Competitive Agents in AI Labour Markets
Christopher Chiu, Simpson Zhang, Mihaela van der Schaar

TL;DR
This paper introduces a comprehensive framework for strategic self-improvement in AI agents within competitive labor markets, emphasizing economic forces, agent capabilities, and market dynamics through simulated experiments.
Contribution
It presents the first framework integrating economic concepts with AI agent capabilities like metacognition and strategic planning, demonstrated via a simulated gig economy environment.
Findings
LLM agents learn strategic self-improvement and adapt to market changes.
Simulations reproduce macroeconomic phenomena like monopolization and price deflation.
AI-driven trends suggest potential for rapid market monopolization.
Abstract
As artificial intelligence (AI) agents are deployed across economic domains, understanding their strategic behavior and market-level impact becomes critical. This paper puts forward a groundbreaking new framework that is the first to capture the real-world economic forces that shape agentic labor markets: adverse selection, moral hazard, and reputation dynamics. Our framework encapsulates three core capabilities that successful LLM-agents will need: \textbf{metacognition} (accurate self-assessment of skills), \textbf{competitive awareness} (modeling rivals and market dynamics), and \textbf{long-horizon strategic planning}. We illustrate our framework through a tractable simulated gig economy where agentic Large Language Models (LLMs) compete for jobs, develop skills, and adapt their strategies under competitive pressure. Our simulations illustrate how LLM agents explicitly prompted with…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The question of how LLM agents behave in economic settings is societally and scientifically relevant; the paper tackles a forward-looking problem with potential policy and ML implications. 2. The market formalization (Competitive Skill-Based Stochastic Game) and the engineering of AI Work (price–reputation scoring, stochastic reranking, capacity constraints, skill/reputation dynamics) are well specified and grounded in economic primitives (Cobb–Douglas/CES scoring, Beta reputation aggregation
1. Simplified proxy tasks and utility model. The simulated tasks are proxy tasks with stochastic scoring; how sensitive are conclusions (e.g., monopolization, wage deflation) to the choice of task scoring function $\gamma$(·), client preference generation P(·), or to the Cobb–Douglas aggregator/parameterization (wq, wp)? The manuscript argues qualitative alignment with known macro facts, but it lacks sensitivity analyses showing results are robust across realistic alternative parameterizations.
S1. The authors construct an interesting market environment in which to test the economic reasoning capabilities of LLM agents. For example, it's interesting to give the LLM agent an explore-exploit style tradeoff between "BID" and "TRAIN" actions. S2. The authors test a wide array of LLMs and compare their system against sensible baselines CoT/ReAct (however, the details of these are insufficiently explained, see W2).
W1 (major). The writing quality of the paper is extremely low, making it difficult to read. * In terms of formatting, there are many newlines missing, and some punctuation marks like "-" replaced with "/" (e.g.: "finite/horizon, discrete/time" in Line 132-3). * In terms of spelling and grammar, there are numerous errors: the paragraph in Lines 204-210 has at least 3 such errors alone ("We describe experiment setup in detail" Line 206, "aability" Line 210, missing period at the end of the paragr
1. The paper narrows from broad “LLM economies” to a controlled gig-market with explicit bidding–training trade-offs and partial observability. This scope makes it easier to connect agent reasoning to market outcomes than in full macro simulators. 2. The market is specified as a multiplayer stochastic game with a clear state (skills, reputation), action (bid/train), and a matching mechanism coupled to prices and ratings. This yields an analyzable knob set (capacity ν, price–reputation weights,
1. Novelty is incremental and not crisply isolated. Prior LLM-agent simulators also study market behavior and “reasoning traces,” and this work’s distinct contribution—framing as a skill-based stochastic game with SSA prompts—lacks an ablation that shows which new component is necessary or sufficient for the headline effects. A component-by-outcome matrix (e.g., remove reputation dynamics, remove CES scoring, remove SSA prompts) is needed to establish a clear methodological delta. 2. Statistical
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Economy and Work Transformation · Mobile Crowdsensing and Crowdsourcing · Language and cultural evolution
