Think Smarter not Harder: Adaptive Reasoning with Inference Aware   Optimization

Zishun Yu; Tengyu Xu; Di Jin; Karthik Abinav Sankararaman; Yun He,; Wenxuan Zhou; Zhouhao Zeng; Eryk Helenowski; Chen Zhu; Sinong Wang; Hao Ma,; Han Fang

arXiv:2501.17974·cs.AI·February 3, 2025

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Zishun Yu, Tengyu Xu, Di Jin, Karthik Abinav Sankararaman, Yun He,, Wenxuan Zhou, Zhouhao Zeng, Eryk Helenowski, Chen Zhu, Sinong Wang, Hao Ma,, Han Fang

PDF

Open Access

TL;DR

This paper introduces IBPO, a method enabling language models to allocate inference resources adaptively based on question difficulty, significantly improving math problem-solving performance under constrained inference budgets.

Contribution

The paper proposes IBPO, a novel inference-aware optimization technique that allows models to dynamically allocate inference budgets, enhancing reasoning efficiency and accuracy.

Findings

01

Achieved up to 5.74% absolute improvement on MATH500 with increased inference budgets.

02

Models learned to allocate more inference resources to harder questions.

03

Performance gains are about twice those of self-consistency under the same budgets.

Abstract

Solving mathematics problems has been an intriguing capability of large language models, and many efforts have been made to improve reasoning by extending reasoning length, such as through self-correction and extensive long chain-of-thoughts. While promising in problem-solving, advanced long reasoning chain models exhibit an undesired single-modal behavior, where trivial questions require unnecessarily tedious long chains of thought. In this work, we propose a way to allow models to be aware of inference budgets by formulating it as utility maximization with respect to an inference budget constraint, hence naming our algorithm Inference Budget-Constrained Policy Optimization (IBPO). In a nutshell, models fine-tuned through IBPO learn to ``understand'' the difficulty of queries and allocate inference budgets to harder ones. With different inference budgets, our best models are able to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Intelligent Tutoring Systems and Adaptive Learning · Data Stream Mining Techniques

MethodsAttentive Walk-Aggregating Graph Neural Network