Implicit Intelligence -- Evaluating Agents on What Users Don't Say

Ved Sirdeshmukh; Marc Wetter

arXiv:2602.20424·cs.AI·February 25, 2026

Implicit Intelligence -- Evaluating Agents on What Users Don't Say

Ved Sirdeshmukh, Marc Wetter

PDF

Open Access

TL;DR

This paper introduces Implicit Intelligence, an evaluation framework for AI agents to understand unstated user requirements through interactive scenarios, revealing significant gaps in current models' contextual reasoning abilities.

Contribution

It presents a novel evaluation method and environment for testing AI agents' ability to infer implicit constraints and goals beyond explicit instructions.

Findings

01

Best model achieves only 48.3% success rate

02

Current models struggle with implicit reasoning tasks

03

Significant room for improvement in contextual understanding

Abstract

Real-world requests to AI agents are fundamentally underspecified. Natural human communication relies on shared context and unstated constraints that speakers expect listeners to infer. Current agentic benchmarks test explicit instruction-following but fail to evaluate whether agents can reason about implicit requirements spanning accessibility needs, privacy boundaries, catastrophic risks, and contextual constraints. We present Implicit Intelligence, an evaluation framework testing whether AI agents can move beyond prompt-following to become genuine goal-fulfillers, paired with Agent-as-a-World (AaW), a harness where interactive worlds are defined in human-readable YAML files and simulated by language models. Our scenarios feature apparent simplicity in user requests, hidden complexity in correct solutions, and discoverability of constraints through environmental exploration.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · AI in Service Interactions