INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs

Junqi Yang; Yuecong Min; Jie Zhang; Shiguang Shan; Xilin Chen

arXiv:2603.11481·cs.CV·March 13, 2026

INFACT: A Diagnostic Benchmark for Induced Faithfulness and Factuality Hallucinations in Video-LLMs

Junqi Yang, Yuecong Min, Jie Zhang, Shiguang Shan, Xilin Chen

PDF

Open Access

TL;DR

This paper introduces INFACT, a comprehensive benchmark for diagnosing faithfulness and factuality hallucinations in Video-LLMs, revealing that current models often lack robustness under various induced conditions.

Contribution

The paper presents INFACT, a new diagnostic benchmark with fine-grained evaluations for faithfulness and factuality in Video-LLMs, including diverse induced modes and reliability metrics.

Findings

01

Higher base accuracy does not ensure reliability in induced modes.

02

Evidence corruption significantly reduces model stability.

03

Many open-source models show minimal temporal sensitivity on factuality questions.

Abstract

Despite rapid progress, Video Large Language Models (Video-LLMs) remain unreliable due to hallucinations, which are outputs that contradict either video evidence (faithfulness) or verifiable world knowledge (factuality). Existing benchmarks provide limited coverage of factuality hallucinations and predominantly evaluate models only in clean settings. We introduce \textsc{INFACT}, a diagnostic benchmark comprising 9{,}800 QA instances with fine-grained taxonomies for faithfulness and factuality, spanning real and synthetic videos. \textsc{INFACT} evaluates models in four modes: Base (clean), Visual Degradation, Evidence Corruption, and Temporal Intervention for order-sensitive items. Reliability under induced modes is quantified using Resist Rate (RR) and Temporal Sensitivity Score (TSS). Experiments on 14 representative Video-LLMs reveal that higher Base-mode accuracy does not reliably…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)