Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say "I Don't Know"

Dhruv Madhwal; Lyuxin David Zhang; Dan Roth; Tomer Wolfson; Vivek Gupta

arXiv:2602.04853·cs.CL·February 5, 2026

Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say "I Don't Know"

Dhruv Madhwal, Lyuxin David Zhang, Dan Roth, Tomer Wolfson, Vivek Gupta

PDF

Open Access

TL;DR

Decomposed prompting helps identify when large language models are uncertain about their knowledge, enabling better error detection without additional training or retrieval, despite not fixing knowledge gaps.

Contribution

The paper shows that disagreement among different prompting regimes effectively signals model uncertainty, leading to a practical abstention method for improving reliability in closed-book QA.

Findings

01

Disagreement signals correlate with potential errors.

02

Cross-regime agreement improves error detection performance.

03

Abstention based on disagreement outperforms standard uncertainty methods.

Abstract

Large language models often struggle to recognize their knowledge limits in closed-book question answering, leading to confident hallucinations. While decomposed prompting is typically used to improve accuracy, we investigate its impact on reliability. We evaluate three task-equivalent prompting regimes: Direct, Assistive, and Incremental, across different model scales and multi-hop QA benchmarks. We find that although accuracy gains from decomposition diminish in frontier models, disagreements between prompting regimes remain highly indicative of potential errors. Because factual knowledge is stable while hallucinations are stochastic, cross-regime agreement provides a precise signal of internal uncertainty. We leverage this signal to implement a training-free abstention policy that requires no retrieval or fine-tuning. Our results show that disagreement-based abstention outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification