A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search
Austin R. Ellis-Mohr, Anuj K. Nayak, Lav R. Varshney

TL;DR
This paper introduces DS3, a framework modeling inference as stochastic traversal over skill graphs, providing analytical insights into inference strategies, resource efficiency, and scaling behaviors of large language models.
Contribution
The paper develops DS3, a novel theoretical framework that unifies various inference strategies and explains their efficiency and scaling patterns in LLMs.
Findings
Linear accuracy scaling with logarithmic compute
Variation in inference strategies based on task difficulty and model capability
Emergent reasoning behavior even when performance plateaus
Abstract
Large language models (LLMs) demand considerable computational, energy, and financial resources during both training and deployment. While scaling laws for training have guided much of the field's recent progress, inference costs now represent a significant and growing component of the overall resource burden, particularly for reasoning-focused models. Existing characterizations of compute-optimality that consider model size, dataset size, and inference tokens in isolation or in fixed combinations risk overlooking more efficient operating points. We introduce directed stochastic skill search (DS3), a general framework that represents inference as stochastic traversal over a learned skill graph. From a simplified yet expressive instantiation, we derive closed-form expressions for task success and compute cost across a wide range of inference strategies -- including chain-of-thought (CoT)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Ethics and Social Impacts of AI
