Agentic AI Systems Should Be Designed as Marginal Token Allocators
Siqi Zhu

TL;DR
This paper advocates for designing agentic AI systems as marginal token allocation economies, unifying various layers under a common economic framework to improve efficiency and address failure modes.
Contribution
It introduces a marginal token allocation perspective that unifies multiple AI system layers and predicts failure modes, guiding future evaluation and resource management research.
Findings
All four AI layers optimize the same economic condition.
Identifies recurring failure modes like over-routing and congestion.
Predicts that local token minimization can misallocate resources.
Abstract
This position paper argues that agentic AI systems should be designed and evaluated as \emph{marginal token allocation economies} rather than as text generators priced by the unit. We follow a single request -- a developer asking a coding agent to fix a failing test -- through four economic layers that today are designed in isolation: a router that decides which model answers, an agent that decides whether to plan, act, verify, or defer, a serving stack that decides how to produce each token, and a training pipeline that decides whether the trace is worth learning from. We show that all four layers are solving the \emph{same} first-order condition -- marginal benefit equals marginal cost plus latency cost plus risk cost -- with different index sets and different prices. The framing is deliberately minimal: we do not propose a complete theory of AI economics. But adopting marginal token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
