Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models
Katrina Brown, Aneesh Muppidi, Rana Shahout

TL;DR
This paper presents Predictive Scheduling, a framework that uses lightweight predictors to estimate query difficulty and allocate token budgets dynamically, significantly improving inference efficiency and accuracy in large language models.
Contribution
It introduces a novel pre-run prediction method for optimal reasoning length allocation, enhancing LLM inference efficiency and accuracy.
Findings
Up to 7.9% accuracy improvement on GSM8K with predictive scheduling.
Middle transformer layers (12-17) are most informative for size estimation.
Pre-run predictions enable better compute-accuracy trade-offs in LLMs.
Abstract
Large language models (LLMs) achieve state-of-the-art accuracy on complex reasoning tasks by generating multiple chain-of-thought (CoT) traces, but using a fixed token budget per query leads to over-computation on easy inputs and under-computation on hard ones. We introduce Predictive Scheduling, a plug-and-play framework that pre-runs lightweight predictors, an MLP on intermediate transformer hidden states or a LoRA-fine-tuned classifier on raw question text, to estimate each query's optimal reasoning length or difficulty before any full generation. Our greedy batch allocator dynamically distributes a fixed total token budget across queries to maximize expected accuracy. On the GSM8K arithmetic benchmark, predictive scheduling yields up to 7.9 percentage points of absolute accuracy gain over uniform budgeting at identical token cost, closing over 50\% of the gap to an oracle with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Big Data and Digital Economy · Machine Learning in Healthcare
