Sparsity Is Necessary: Polynomial-Time Stability for Agentic LLMs in Large Action Spaces
Angshul Majumdar

TL;DR
This paper introduces Sparse Agentic Control (SAC), a framework for understanding and learning stable policies in large action spaces for agentic LLMs, emphasizing the importance of sparsity for sample efficiency and stability.
Contribution
The paper formalizes SAC, providing theoretical guarantees for sparse policy learning, support recovery, and stability in large action spaces, with extensions to various settings.
Findings
Estimation error scales as k (log M / T)^{1/2}.
Exact support recovery when T > k log M.
Dense policies require Omega(M) samples, indicating instability.
Abstract
Tool-augmented LLM systems expose a control regime that learning theory has largely ignored: sequential decision-making with a massive discrete action universe (tools, APIs, documents) in which only a small, unknown subset is relevant for any fixed task distribution. We formalize this setting as Sparse Agentic Control (SAC), where policies admit block-sparse representations over M >> 1 actions and rewards depend on sparse main effects and (optionally) sparse synergies. We study ell_{1,2}-regularized policy learning through a convex surrogate and establish sharp, compressed-sensing-style results: (i) estimation and value suboptimality scale as k (log M / T)^{1/2} under a Policy-RSC condition; (ii) exact tool-support recovery holds via primal-dual witness arguments when T > k log M under incoherence and beta-min; and (iii) any dense policy class requires Omega(M) samples, explaining the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control
