MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues
Zheyuan Liu, Dongwhi Kim, Yixin Wan, Xiangchi Yuan, Zhaoxuan Tan, Fengran Mo, Meng Jiang

TL;DR
This paper introduces MTMCS-Bench, a comprehensive benchmark for assessing the contextual safety of multimodal large language models in multi-turn dialogues, addressing the gradual emergence of malicious intent and context-switch risks.
Contribution
The paper presents MTMCS-Bench, a new benchmark with over 30,000 samples for evaluating safety in multimodal models across realistic multi-turn interactions and risk scenarios.
Findings
Models show trade-offs between safety and utility.
Guardrails mitigate some risks but are not fully effective.
Persistent safety challenges remain in multi-turn multimodal dialogues.
Abstract
Multimodal large language models (MLLMs) are increasingly deployed as assistants that interact through text and images, making it crucial to evaluate contextual safety when risk depends on both the visual scene and the evolving dialogue. Existing contextual safety benchmarks are mostly single-turn and often miss how malicious intent can emerge gradually or how the same scene can support both benign and exploitative goals. We introduce the Multi-Turn Multimodal Contextual Safety Benchmark (MTMCS-Bench), a benchmark of realistic images and multi-turn conversations that evaluates contextual safety in MLLMs under two complementary settings, escalation-based risk and context-switch risk. MTMCS-Bench offers paired safe and unsafe dialogues with structured evaluation. It contains over 30 thousand multimodal (image+text) and unimodal (text-only) samples, with metrics that separately measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Adversarial Robustness in Machine Learning
