Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Junlin Wang, Siddhartha Jain, Dejiao Zhang, Baishakhi Ray, Varun, Kumar, Ben Athiwaratkun

TL;DR
This paper proposes a framework for evaluating large language model reasoning strategies by considering both performance and compute cost, revealing that simpler methods often outperform complex ones when resources are accounted for.
Contribution
It introduces a compute-aware evaluation framework for reasoning strategies, highlighting the importance of resource considerations in assessing their effectiveness.
Findings
Complex strategies often don't outperform simple baselines when compute is controlled.
Self-consistency with similar compute often outperforms more complex strategies.
Some strategies like multi-agent debate degrade with increased compute.
Abstract
A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces a framework that incorporates the compute budget into the evaluation, providing a more informative comparison that takes into account both performance metrics and computational cost. In this budget-aware perspective, we find that complex reasoning strategies often don't surpass simpler baselines purely due to algorithmic ingenuity, but rather due to the larger computational resources allocated. When we provide a simple baseline like chain-of-thought self-consistency with comparable compute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAuction Theory and Applications · Organizational Management and Leadership · Multi-Agent Systems and Negotiation
MethodsFocus
