MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Zhongxi Wang; Yueqian Lin; Jingyang Zhang; Hai Helen Li; Yiran Chen

arXiv:2603.02482·cs.LG·March 4, 2026

MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Zhongxi Wang, Yueqian Lin, Jingyang Zhang, Hai Helen Li, Yiran Chen

PDF

Open Access

TL;DR

MUSE is an open-source platform for comprehensive safety evaluation of large multimodal language models, enabling systematic testing across text, audio, image, and video inputs with novel attack strategies and metrics.

Contribution

The paper introduces MUSE, a run-centric, multimodal safety evaluation platform with new attack algorithms, dual-metric framework, and modality switching techniques for cross-modal safety assessment.

Findings

01

Multi-turn strategies achieve up to 100% attack success rate.

02

Inter-turn modality switching accelerates attack convergence.

03

Modality effects vary across model families, not universally.

Abstract

Safety evaluation and red-teaming of large language models remain predominantly text-centric, and existing frameworks lack the infrastructure to systematically test whether alignment generalizes to audio, image, and video inputs. We present MUSE (Multimodal Unified Safety Evaluation), an open-source, run-centric platform that integrates automatic cross-modal payload generation, three multi-turn attack algorithms (Crescendo, PAIR, Violent Durian), provider-agnostic model routing, and an LLM judge with a five-level safety taxonomy into a single browser-based system. A dual-metric framework distinguishes hard Attack Success Rate (Compliance only) from soft ASR (including Partial Compliance), capturing partial information leakage that binary metrics miss. To probe whether alignment generalizes across modality boundaries, we introduce Inter-Turn Modality Switching (ITMS), which augments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques