Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models

Wenbin Xing; Quanxing Zha; Lizheng Zu; Mengran Li; Ming Li; Junchi Yan

arXiv:2602.00559·cs.CV·February 3, 2026

Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models

Wenbin Xing, Quanxing Zha, Lizheng Zu, Mengran Li, Ming Li, Junchi Yan

PDF

Open Access

TL;DR

This paper introduces OmniVCHall, a comprehensive benchmark for evaluating compositional hallucinations in video multimodal large language models, and proposes TriCD, a contrastive decoding framework that significantly improves model accuracy in this challenging setting.

Contribution

It presents OmniVCHall, a new benchmark for systematic evaluation of compositional hallucinations, and proposes TriCD, a novel contrastive decoding method with adaptive components to mitigate hallucinations.

Findings

01

Advanced VLLMs show significant performance drops on the benchmark.

02

TriCD improves accuracy by over 10% across different models.

03

The benchmark includes diverse video domains and a new camera-based hallucination type.

Abstract

Current research on video hallucination mitigation primarily focuses on isolated error types, leaving compositional hallucinations, arising from incorrect reasoning over multiple interacting spatial and temporal factors largely underexplored. We introduce OmniVCHall, a benchmark designed to systematically evaluate both isolated and compositional hallucinations in video multimodal large language models (VLLMs). OmniVCHall spans diverse video domains, introduces a novel camera-based hallucination type, and defines a fine-grained taxonomy, together with adversarial answer options (e.g., "All are correct" and "None of the above") to prevent shortcut reasoning. The evaluations of 39 representative VLLMs reveal that even advanced models (e.g., Qwen3-VL and GPT-5) exhibit substantial performance degradation. We propose TriCD, a contrastive decoding framework with a triple-pathway calibration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Ferroelectric and Negative Capacitance Devices