Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models
Wenbin Xing, Quanxing Zha, Lizheng Zu, Mengran Li, Ming Li, Junchi Yan

TL;DR
This paper introduces OmniVCHall, a comprehensive benchmark for evaluating compositional hallucinations in video multimodal large language models, and proposes TriCD, a contrastive decoding framework that significantly improves model accuracy in this challenging setting.
Contribution
It presents OmniVCHall, a new benchmark for systematic evaluation of compositional hallucinations, and proposes TriCD, a novel contrastive decoding method with adaptive components to mitigate hallucinations.
Findings
Advanced VLLMs show significant performance drops on the benchmark.
TriCD improves accuracy by over 10% across different models.
The benchmark includes diverse video domains and a new camera-based hallucination type.
Abstract
Current research on video hallucination mitigation primarily focuses on isolated error types, leaving compositional hallucinations, arising from incorrect reasoning over multiple interacting spatial and temporal factors largely underexplored. We introduce OmniVCHall, a benchmark designed to systematically evaluate both isolated and compositional hallucinations in video multimodal large language models (VLLMs). OmniVCHall spans diverse video domains, introduces a novel camera-based hallucination type, and defines a fine-grained taxonomy, together with adversarial answer options (e.g., "All are correct" and "None of the above") to prevent shortcut reasoning. The evaluations of 39 representative VLLMs reveal that even advanced models (e.g., Qwen3-VL and GPT-5) exhibit substantial performance degradation. We propose TriCD, a contrastive decoding framework with a triple-pathway calibration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Ferroelectric and Negative Capacitance Devices
