TL;DR
DiffAdapt introduces a difficulty-aware inference framework that adaptively selects reasoning strategies based on problem difficulty and entropy, significantly reducing token usage without sacrificing accuracy.
Contribution
It proposes a novel, lightweight method that classifies reasoning difficulty using entropy to improve token efficiency in LLM inference without fine-tuning.
Findings
Achieves up to 22.4% token reduction while maintaining accuracy.
Identifies a U-shaped entropy pattern across problem difficulties.
Demonstrates effectiveness across five models and eight benchmarks.
Abstract
Recent reasoning Large Language Models (LLMs) demonstrate remarkable problem-solving abilities but often generate long thinking traces whose utility is unclear. Our work aims to improve their efficiency, enabling them to reach high performance without overthinking. First, we analyze the entropy of token probabilities in reasoning traces. Across three models, we observe a consistent U-shaped entropy pattern: high entropy on easy problems despite high accuracy, low entropy on problems with medium difficulty, and high entropy on hard problems reflecting uncertainty. Specifically, we notice 22--25\% entropy reduction from easy to medium difficulty regions, suggesting an {overthinking} phenomenon on easy instances. Building on these insights, we introduce \textbf{DiffAdapt}, a lightweight framework that selects Easy/Normal/Hard inference strategies per question based on their difficulty and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
