Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
Youze Wang, Zijun Chen, Ruoyu Chen, Shishen Gu, Wenbo Hu, Jiayang Liu, Yinpeng Dong, Hang Su, Jun Zhu, Meng Wang, Richang Hong

TL;DR
This paper introduces Trust-videoLLMs, a comprehensive benchmark for evaluating the trustworthiness of multimodal large language models in video understanding, focusing on accuracy, safety, fairness, and privacy across diverse tasks.
Contribution
It presents the first extensive benchmark assessing 23 videoLLMs on multiple trustworthiness dimensions, highlighting current limitations and guiding future improvements.
Findings
Significant gaps in dynamic scene understanding and robustness.
Open-source models sometimes outperform proprietary ones.
Scaling models does not always enhance trustworthiness.
Abstract
Recent advancements in multimodal large language models for video understanding (videoLLMs) have enhanced their capacity to process complex spatiotemporal data. However, challenges such as factual inaccuracies, harmful content, biases, hallucinations, and privacy risks compromise their reliability. This study introduces Trust-videoLLMs, a first comprehensive benchmark evaluating 23 state-of-the-art videoLLMs (5 commercial, 18 open-source) across five critical dimensions: truthfulness, robustness, safety, fairness, and privacy. Comprising 30 tasks with adapted, synthetic, and annotated videos, the framework assesses spatiotemporal risks, temporal consistency and cross-modal impact. Results reveal significant limitations in dynamic scene comprehension, cross-modal perturbation resilience and real-world risk mitigation. While open-source models occasionally outperform, proprietary models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAccess Control and Trust · Multi-Agent Systems and Negotiation · Topic Modeling
