Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving
Marwan AbdElhameed, Pavly Halim

TL;DR
This paper empirically analyzes the trade-offs between reasoning accuracy and computational efficiency in large language models, revealing challenges in integrating these objectives and highlighting the need for new architectures.
Contribution
It provides a comprehensive empirical study of combining reasoning and efficiency methods in LLMs, demonstrating the complex interplay and limitations of current approaches.
Findings
Quiet-STaR achieves high accuracy but with high computational cost
REBASE offers efficiency with baseline accuracy
Combining methods degrades performance, indicating fundamental challenges
Abstract
Recent advances in large language models (LLMs) have predominantly focused on maximizing accuracy and reasoning capabilities, often overlooking crucial computational efficiency considerations. While this approach has yielded impressive accuracy improvements, it has led to methods that may be impractical for real-world deployment due to computational overhead and latency constraints. This paper investigates the potential synergy between reasoning enhancement and computational efficiency by analyzing the integration of two contrasting approaches: Quiet-STaR (Self-Taught Reasoner) and REBASE (REward BAlanced SEarch). Through comprehensive empirical analysis using the Mistral-7B model on the GSM8K dataset, we demonstrate that while each method excels in its primary objective-Quiet-STaR achieving superior accuracy (32.03%) despite high computational cost (554.66s runtime, 12.73T FLOPs), and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques
