When Is Enough Not Enough? Illusory Completion in Search Agents

Dayoon Ko; Jihyuk Kim; Sohyeon Kim; Haeju Park; Dahyun Lee; Gunhee Kim; Moontae Lee; Kyungjae Lee

arXiv:2602.07549·cs.AI·February 10, 2026

When Is Enough Not Enough? Illusory Completion in Search Agents

Dayoon Ko, Jihyuk Kim, Sohyeon Kim, Haeju Park, Dahyun Lee, Gunhee Kim, Moontae Lee, Kyungjae Lee

PDF

Open Access

TL;DR

This paper investigates the reasoning reliability of search agents on multi-constraint problems, revealing frequent illusory completion and proposing the Epistemic Ledger framework to diagnose and improve their constraint verification capabilities.

Contribution

It introduces the Epistemic Ledger framework for tracking evidence and beliefs during reasoning, and demonstrates that explicit constraint-state tracking improves agent accuracy and reduces errors.

Findings

01

Illusory completion occurs frequently in multi-constraint reasoning.

02

The Epistemic Ledger effectively diagnoses reasoning failures.

03

Explicit constraint tracking improves accuracy and reduces underverification.

Abstract

Recent search agents leverage multi-turn reasoning and search tools to achieve strong performance on multi-hop and long-horizon benchmarks. Yet it remains unclear whether they reliably reason across all requirements by tracking, verifying, and maintaining multiple conditions in these questions. We study this capability under multi-constraint problems, where valid answers must satisfy several constraints simultaneously. We find that illusory completion frequently occurs, wherein agents believe tasks are complete despite unresolved or violated constraints, leading to underverified answers. To diagnose this behavior, we introduce the Epistemic Ledger, an evaluation framework that tracks evidential support and agents' beliefs for each constraint throughout multi-turn reasoning. Our analysis reveals four recurring failure patterns: bare assertions, overlooked refutations, stagnation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Information Retrieval and Search Behavior · Personal Information Management and User Behavior