Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability

Haiqi Yang; Jinzhe Li; Gengxu Li; Yi Chang; Yuan Wu

arXiv:2508.04017·cs.CV·August 7, 2025

Can Large Multimodal Models Actively Recognize Faulty Inputs? A Systematic Evaluation Framework of Their Input Scrutiny Ability

Haiqi Yang, Jinzhe Li, Gengxu Li, Yi Chang, Yuan Wu

PDF

TL;DR

This paper introduces a systematic evaluation framework to assess whether large multimodal models can actively detect and scrutinize faulty inputs, revealing their current limitations and modality-specific challenges.

Contribution

The study presents ISEval, a novel framework with seven flawed premise categories and three metrics, to evaluate LMMs' input scrutiny abilities comprehensively.

Findings

01

Most models struggle to detect flawed premises without guidance.

02

Models perform better on logical fallacies than surface errors.

03

Modality trust varies among models, affecting error detection.

Abstract

Large Multimodal Models (LMMs) have witnessed remarkable growth, showcasing formidable capabilities in handling intricate multimodal tasks with exceptional performance. Recent research has underscored the inclination of large language models to passively accept defective inputs, often resulting in futile reasoning on invalid prompts. However, the same critical question of whether LMMs can actively detect and scrutinize erroneous inputs still remains unexplored. To address this gap, we introduce the Input Scrutiny Ability Evaluation Framework (ISEval), which encompasses seven categories of flawed premises and three evaluation metrics. Our extensive evaluation of ten advanced LMMs has identified key findings. Most models struggle to actively detect flawed textual premises without guidance, which reflects a strong reliance on explicit prompts for premise error identification. Error type…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.