REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM

Madhur Jindal; Saurabh Deshpande

arXiv:2505.04673·cs.CL·May 9, 2025

REVEAL: Multi-turn Evaluation of Image-Input Harms for Vision LLM

Madhur Jindal, Saurabh Deshpande

PDF

Open Access 1 Repo

TL;DR

The paper introduces REVEAL, a comprehensive framework for evaluating multi-turn image-input harms in vision LLMs, revealing vulnerabilities and performance differences across models in safety-critical scenarios.

Contribution

It presents a scalable, automated evaluation pipeline specifically designed for multi-turn image-input safety assessment in vision LLMs, addressing limitations of prior single-turn, text-only frameworks.

Findings

01

Multi-turn interactions increase defect rates in VLLMs.

02

GPT-4o has the best safety-usability balance among evaluated models.

03

Misinformation detection remains a critical challenge.

Abstract

Vision Large Language Models (VLLMs) represent a significant advancement in artificial intelligence by integrating image-processing capabilities with textual understanding, thereby enhancing user interactions and expanding application domains. However, their increased complexity introduces novel safety and ethical challenges, particularly in multi-modal and multi-turn conversations. Traditional safety evaluation frameworks, designed for text-based, single-turn interactions, are inadequate for addressing these complexities. To bridge this gap, we introduce the REVEAL (Responsible Evaluation of Vision-Enabled AI LLMs) Framework, a scalable and automated pipeline for evaluating image-input harms in VLLMs. REVEAL includes automated image mining, synthetic adversarial data generation, multi-turn conversational expansion using crescendo attack strategies, and comprehensive harm assessment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Madhur-1/RevealVLLMSafetyEval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Multimodal Machine Learning Applications