Mapping User Trust in Vision Language Models: Research Landscape,   Challenges, and Prospects

Agnese Chiatti; Sara Bernardini; Lara Shibelski Godoy Piccolo; Viola; Schiaffonati; Matteo Matteucci

arXiv:2505.05318·cs.CV·May 9, 2025

Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects

Agnese Chiatti, Sara Bernardini, Lara Shibelski Godoy Piccolo, Viola, Schiaffonati, Matteo Matteucci

PDF

Open Access

TL;DR

This survey explores how users develop trust in Vision Language Models, analyzing current research, challenges, and future directions to ensure reliable and transparent AI-human interactions.

Contribution

It provides a comprehensive taxonomy of trust dynamics in VLMs and offers preliminary requirements for future trust-related research in this domain.

Findings

01

Identifies key factors influencing user trust in VLMs

02

Highlights gaps in current trust research and understanding

03

Proposes a multidisciplinary framework for studying trust in VLMs

Abstract

The rapid adoption of Vision Language Models (VLMs), pre-trained on large image-text and video-text datasets, calls for protecting and informing users about when to trust these systems. This survey reviews studies on trust dynamics in user-VLM interactions, through a multi-disciplinary taxonomy encompassing different cognitive science capabilities, collaboration modes, and agent behaviours. Literature insights and findings from a workshop with prospective VLM users inform preliminary requirements for future VLM trust studies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)