Hidden in Plain Sight: Reasoning in Underspecified and Misspecified Scenarios for Multimodal LLMs
Qianqi Yan, Hongquan Li, Shan Jiang, Yang Zhao, Xinze Guan, Ching-Chen Kuo, Xin Eric Wang

TL;DR
This paper systematically evaluates how current multimodal large language models handle implicit reasoning in messy, real-world scenarios, revealing a gap between their reasoning abilities and behavioral compliance, and proposing simple interventions to improve trustworthiness.
Contribution
It introduces a diagnostic suite for assessing MLLMs on implicit reasoning tasks and demonstrates that simple inference-time prompts can significantly enhance their performance in underconstrained environments.
Findings
Models often fail to detect hidden issues in ambiguous scenarios.
Explicit prompts can reveal underlying reasoning capabilities.
Simple interventions like clarifying questions improve model trustworthiness.
Abstract
Multimodal large language models (MLLMs) are increasingly deployed in open-ended, real-world environments where inputs are messy, underspecified, and not always trustworthy. Unlike curated benchmarks, these settings frequently involve instructions that refer to missing objects or contradictory facts, rely on ambiguous references, or request infeasible actions. In such cases, success hinges not on task execution alone, but on a model's ability to detect when something is silently wrong. This paper presents a systematic analysis of how current MLLMs handle such implicit reasoning scenarios: cases where the flaw is not explicitly stated but must be inferred from context. Using a curated diagnostic suite spanning four categories of real-world failure modes, we evaluate six MLLMs, including o3 and GPT-4o, and find that models frequently fail to surface hidden issues, even when they possess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
