From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?

Zhanke Zhou; Xiao Feng; Zhaocheng Zhu; Jiangchao Yao; Sanmi Koyejo; Bo Han

arXiv:2506.08295·cs.LG·June 11, 2025

From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?

Zhanke Zhou, Xiao Feng, Zhaocheng Zhu, Jiangchao Yao, Sanmi Koyejo, Bo Han

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces AR-Bench, a new benchmark to evaluate large language models' active reasoning abilities, revealing their significant challenges in acquiring and using missing information in interactive scenarios.

Contribution

The paper presents AR-Bench, a novel benchmark for assessing active reasoning in LLMs, and provides empirical evidence of their limitations in real-world, interactive tasks.

Findings

01

LLMs struggle with active reasoning tasks in AR-Bench.

02

Advanced strategies only modestly improve active reasoning performance.

03

There is a critical need for new training methodologies for active reasoning.

Abstract

While existing benchmarks probe the reasoning abilities of large language models (LLMs) across diverse domains, they predominantly assess passive reasoning, providing models with all the information needed to reach a solution. By contrast, active reasoning-where an LLM must interact with external systems to acquire missing evidence or data-has received little systematic attention. To address this shortfall, we present AR-Bench, a novel benchmark designed explicitly to evaluate an LLM's active reasoning skills. AR-Bench comprises three task families-detective cases, situation puzzles, and guessing numbers-that together simulate real-world, agentic scenarios and measure performance across commonsense, logical, and symbolic reasoning challenges. Empirical evaluation on AR-Bench demonstrates that contemporary LLMs exhibit pronounced difficulties with active reasoning: they frequently fail…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tmlr-group/ar-bench
noneOfficial

Videos

From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?· slideslive

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education