Inference-Time Computations for LLM Reasoning and Planning: A Benchmark and Insights
Shubham Parashar, Blake Olson, Sambhav Khurana, Eric Li, Hongyi Ling,, James Caverlee, Shuiwang Ji

TL;DR
This paper introduces Sys2Bench, a comprehensive benchmark for evaluating inference-time reasoning techniques in large language models across diverse tasks, revealing limitations in current scaling approaches.
Contribution
It presents a new benchmark, Sys2Bench, and provides extensive analysis of inference-time techniques, highlighting their strengths and limitations in LLM reasoning and planning.
Findings
Scaling inference-time techniques alone has limited effectiveness.
No single inference-time method outperforms others across all tasks.
Extensive experiments across diverse reasoning categories were conducted.
Abstract
We examine the reasoning and planning capabilities of large language models (LLMs) in solving complex tasks. Recent advances in inference-time techniques demonstrate the potential to enhance LLM reasoning without additional training by exploring intermediate steps during inference. Notably, OpenAI's o1 model shows promising performance through its novel use of multi-step reasoning and verification. Here, we explore how scaling inference-time techniques can improve reasoning and planning, focusing on understanding the tradeoff between computational cost and performance. To this end, we construct a comprehensive benchmark, known as Sys2Bench, and perform extensive experiments evaluating existing inference-time techniques on eleven diverse tasks across five categories, including arithmetic reasoning, logical reasoning, common sense reasoning, algorithmic reasoning, and planning. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Logic, Reasoning, and Knowledge
