Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs

Xiangyang Zhu; Yuan Tian; Zicheng Zhang; Qi Jia; Chunyi Li; Renrui Zhang; Heng Li; Zongrui Wang; Wei Sun

arXiv:2601.19507·cs.CL·January 28, 2026

Automated Safety Benchmarking: A Multi-agent Pipeline for LVLMs

Xiangyang Zhu, Yuan Tian, Zicheng Zhang, Qi Jia, Chunyi Li, Renrui Zhang, Heng Li, Zongrui Wang, Wei Sun

PDF

Open Access

TL;DR

This paper introduces VLSafetyBencher, an automated multi-agent system that rapidly constructs high-quality safety benchmarks for LVLMs, addressing limitations of manual benchmarks and enhancing model safety evaluation.

Contribution

The paper presents the first automated system for LVLM safety benchmarking, significantly reducing construction time and improving discriminative power of safety assessments.

Findings

01

Constructs high-quality safety benchmarks within one week

02

Achieves a 70% safety rate disparity among models

03

Demonstrates effectiveness of multi-agent approach in safety evaluation

Abstract

Large vision-language models (LVLMs) exhibit remarkable capabilities in cross-modal tasks but face significant safety challenges, which undermine their reliability in real-world applications. Efforts have been made to build LVLM safety evaluation benchmarks to uncover their vulnerability. However, existing benchmarks are hindered by their labor-intensive construction process, static complexity, and limited discriminative power. Thus, they may fail to keep pace with rapidly evolving models and emerging risks. To address these limitations, we propose VLSafetyBencher, the first automated system for LVLM safety benchmarking. VLSafetyBencher introduces four collaborative agents: Data Preprocessing, Generation, Augmentation, and Selection agents to construct and select high-quality samples. Experiments validates that VLSafetyBencher can construct high-quality safety benchmarks within one week…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning