Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models

Katrina Brown; Aneesh Muppidi; Rana Shahout

arXiv:2602.01237·cs.AI·February 3, 2026

Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models

Katrina Brown, Aneesh Muppidi, Rana Shahout

PDF

Open Access

TL;DR

This paper presents Predictive Scheduling, a framework that uses lightweight predictors to estimate query difficulty and allocate token budgets dynamically, significantly improving inference efficiency and accuracy in large language models.

Contribution

It introduces a novel pre-run prediction method for optimal reasoning length allocation, enhancing LLM inference efficiency and accuracy.

Findings

01

Up to 7.9% accuracy improvement on GSM8K with predictive scheduling.

02

Middle transformer layers (12-17) are most informative for size estimation.

03

Pre-run predictions enable better compute-accuracy trade-offs in LLMs.

Abstract

Large language models (LLMs) achieve state-of-the-art accuracy on complex reasoning tasks by generating multiple chain-of-thought (CoT) traces, but using a fixed token budget per query leads to over-computation on easy inputs and under-computation on hard ones. We introduce Predictive Scheduling, a plug-and-play framework that pre-runs lightweight predictors, an MLP on intermediate transformer hidden states or a LoRA-fine-tuned classifier on raw question text, to estimate each query's optimal reasoning length or difficulty before any full generation. Our greedy batch allocator dynamically distributes a fixed total token budget across queries to maximize expected accuracy. On the GSM8K arithmetic benchmark, predictive scheduling yields up to 7.9 percentage points of absolute accuracy gain over uniform budgeting at identical token cost, closing over 50\% of the gap to an oracle with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Big Data and Digital Economy · Machine Learning in Healthcare