Optimizing Reasoning Efficiency through Prompt Difficulty Prediction

Bo Zhao; Berkcan Kapusuzoglu; Kartik Balasubramaniam; Sambit Sahu; Supriyo Chakraborty; Genta Indra Winata

arXiv:2511.03808·cs.LG·November 7, 2025

Optimizing Reasoning Efficiency through Prompt Difficulty Prediction

Bo Zhao, Berkcan Kapusuzoglu, Kartik Balasubramaniam, Sambit Sahu, Supriyo Chakraborty, Genta Indra Winata

PDF

Open Access

TL;DR

This paper introduces a routing method that assigns reasoning tasks to the smallest suitable model based on difficulty prediction, significantly reducing computational costs while maintaining high accuracy.

Contribution

It presents a novel difficulty-aware routing approach that leverages lightweight predictors to efficiently allocate reasoning tasks across models of varying sizes.

Findings

01

Routing improves efficiency over random assignment.

02

Matches large model performance with less compute.

03

Effective for cost-efficient deployment of reasoning models.

Abstract

Reasoning language models perform well on complex tasks but are costly to deploy due to their size and long reasoning traces. We propose a routing approach that assigns each problem to the smallest model likely to solve it, reducing compute without sacrificing accuracy. Using intermediate representations from s1.1-32B, we train lightweight predictors of problem difficulty or model correctness to guide routing across a pool of reasoning models. On diverse math benchmarks, routing improves efficiency over random assignment and matches s1.1-32B's performance while using significantly less compute. Our results demonstrate that difficulty-aware routing is effective for cost-efficient deployment of reasoning models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications