Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation

Yichi Zhang; Yao Huang; Yifan Wang; Yitong Sun; Chang Liu; Zhe Zhao; Zhengwei Fang; Huanran Chen; Xiao Yang; Xingxing Wei; Hang Su; Yinpeng Dong; Jun Zhu

arXiv:2508.15370·cs.CL·August 22, 2025

Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation

Yichi Zhang, Yao Huang, Yifan Wang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu

PDF

Open Access

TL;DR

This paper introduces MultiTrust-X, a comprehensive benchmark for evaluating and mitigating trustworthiness issues in Multimodal Large Language Models, revealing vulnerabilities and proposing a reasoning-enhanced safety approach.

Contribution

It presents a new holistic benchmark with a three-dimensional framework, covering five trustworthiness aspects, two novel risk types, and multiple mitigation strategies for MLLMs.

Findings

01

Current models have significant trustworthiness vulnerabilities.

02

Multimodal training can amplify risks in base LLMs.

03

Few mitigation methods effectively address overall trustworthiness.

Abstract

The trustworthiness of Multimodal Large Language Models (MLLMs) remains an intense concern despite the significant progress in their capabilities. Existing evaluation and mitigation approaches often focus on narrow aspects and overlook risks introduced by the multimodality. To tackle these challenges, we propose MultiTrust-X, a comprehensive benchmark for evaluating, analyzing, and mitigating the trustworthiness issues of MLLMs. We define a three-dimensional framework, encompassing five trustworthiness aspects which include truthfulness, robustness, safety, fairness, and privacy; two novel risk types covering multimodal risks and cross-modal impacts; and various mitigation strategies from the perspectives of data, model architecture, training, and inference algorithms. Based on the taxonomy, MultiTrust-X includes 32 tasks and 28 curated datasets, enabling holistic evaluations over 30…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Topic Modeling