Cascaded Information Disclosure for Generalized Evaluation of Problem Solving Capabilities

Yunxiang Yan; Tomohiro Sawada; Kartik Goyal

arXiv:2507.23776·cs.CL·August 1, 2025

Cascaded Information Disclosure for Generalized Evaluation of Problem Solving Capabilities

Yunxiang Yan, Tomohiro Sawada, Kartik Goyal

PDF

Open Access

TL;DR

This paper introduces a cascaded question disclosure framework that improves the evaluation of large language models' problem-solving abilities by providing more accurate, stagewise reasoning insights compared to traditional QA benchmarks.

Contribution

It proposes a novel cascaded question disclosure method that offers a more precise and generalizable evaluation of LLMs' problem-solving capabilities, surpassing standard QA benchmarks.

Findings

01

Better comparison of LLMs' reasoning abilities

02

Induces more informative intermediate traces in models

03

Narrows performance gaps observed in standard evaluations

Abstract

While question-answering~(QA) benchmark performance is an automatic and scalable method to compare LLMs, it is an indirect method of evaluating their underlying problem-solving capabilities. Therefore, we propose a holistic and generalizable framework based on \emph{cascaded question disclosure} that provides a more accurate estimate of the models' problem-solving capabilities while maintaining the scalability and automation. This approach collects model responses in a stagewise manner with each stage revealing partial information about the question designed to elicit generalized reasoning in LLMs. We find that our approach not only provides a better comparison between LLMs, but also induces better intermediate traces in models compared to the standard QA paradigm. We empirically verify this behavior on diverse reasoning and knowledge-heavy QA datasets by comparing LLMs of varying sizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making · Big Data and Business Intelligence