Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding

Youze Wang; Zijun Chen; Ruoyu Chen; Shishen Gu; Wenbo Hu; Jiayang Liu; Yinpeng Dong; Hang Su; Jun Zhu; Meng Wang; Richang Hong

arXiv:2506.12336·cs.CV·November 27, 2025

Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding

Youze Wang, Zijun Chen, Ruoyu Chen, Shishen Gu, Wenbo Hu, Jiayang Liu, Yinpeng Dong, Hang Su, Jun Zhu, Meng Wang, Richang Hong

PDF

Open Access

TL;DR

This paper introduces Trust-videoLLMs, a comprehensive benchmark for evaluating the trustworthiness of multimodal large language models in video understanding, focusing on accuracy, safety, fairness, and privacy across diverse tasks.

Contribution

It presents the first extensive benchmark assessing 23 videoLLMs on multiple trustworthiness dimensions, highlighting current limitations and guiding future improvements.

Findings

01

Significant gaps in dynamic scene understanding and robustness.

02

Open-source models sometimes outperform proprietary ones.

03

Scaling models does not always enhance trustworthiness.

Abstract

Recent advancements in multimodal large language models for video understanding (videoLLMs) have enhanced their capacity to process complex spatiotemporal data. However, challenges such as factual inaccuracies, harmful content, biases, hallucinations, and privacy risks compromise their reliability. This study introduces Trust-videoLLMs, a first comprehensive benchmark evaluating 23 state-of-the-art videoLLMs (5 commercial, 18 open-source) across five critical dimensions: truthfulness, robustness, safety, fairness, and privacy. Comprising 30 tasks with adapted, synthetic, and annotated videos, the framework assesses spatiotemporal risks, temporal consistency and cross-modal impact. Results reveal significant limitations in dynamic scene comprehension, cross-modal perturbation resilience and real-world risk mitigation. While open-source models occasionally outperform, proprietary models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAccess Control and Trust · Multi-Agent Systems and Negotiation · Topic Modeling