Auditing the Reliability of Multimodal Generative Search

Erfan Samieyan Sahneh; Luca Maria Aiello

arXiv:2604.00944·cs.CY·April 2, 2026

Auditing the Reliability of Multimodal Generative Search

Erfan Samieyan Sahneh, Luca Maria Aiello

PDF

TL;DR

This paper audits the reliability of multimodal generative search systems, revealing a significant portion of claims are unsupported by cited videos, highlighting trustworthiness issues.

Contribution

It provides the first large-scale analysis of video-grounded claims in multimodal search, identifying common failure modes and factors linked to unsupported claims.

Findings

01

3.7% to 18.7% of claims are unsupported by sources

02

Unsupported claims often involve unverifiable details and overstated assertions

03

Claims with low semantic similarity to videos are more likely unsupported

Abstract

Multimodal Large Language Models (MLLMs) increasingly function as generative search systems that retrieve and synthesize answers from multimedia content, including YouTube videos. Although these systems project authority by citing specific videos as evidence, the extent to which these citations genuinely substantiate the generated claims remains unexamined. We present a large-scale audit of the Gemini 2.5 Pro multimodal search system, analyzing 11,943 claim-video pairs generated across Medical, Economic, and General domains. Through automated verification using three independent LLM judges (87.7% inter-rater agreement), validated against human annotations, we find that depending on the judge's strictness, between 3.7% and 18.7% of video-grounded claims are not supported by their cited sources. The dominant failure modes are not outright contradictions but rather unverifiable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.