GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

Yuxiao Xiang; Junchi Chen; Zhenchao Jin; Changtao Miao; Haojie Yuan; Qi Chu; Tao Gong; Nenghai Yu

arXiv:2511.20994·cs.CV·November 27, 2025

GuardTrace-VL: Detecting Unsafe Multimodel Reasoning via Iterative Safety Supervision

Yuxiao Xiang, Junchi Chen, Zhenchao Jin, Changtao Miao, Haojie Yuan, Qi Chu, Tao Gong, Nenghai Yu

PDF

Open Access 1 Models

TL;DR

GuardTrace-VL is a vision-aware safety auditing method for multimodal reasoning models that detects unsafe intermediate content during the reasoning process, significantly improving safety detection accuracy.

Contribution

The paper introduces GuardTrace-VL, a novel joint image-text safety auditor with a new dataset and training scheme, enhancing detection of unsafe reasoning in multimodal models.

Findings

01

Achieves 93.1% F1 score on unsafe reasoning detection

02

Outperforms previous safety methods by 13.5% F1 score

03

Effective in both in-domain and out-of-domain scenarios

Abstract

Multimodal large reasoning models (MLRMs) are increasingly deployed for vision-language tasks that produce explicit intermediate rationales. However, reasoning traces can contain unsafe content even when the final answer is non-harmful, creating deployment risks. Existing multimodal safety guards primarily evaluate only the input question and the final answer, neglecting the intermediate reasoning process. This oversight allows undetected harm, such as biased inferences or policy-violating use of visual context, to emerge during reasoning. We introduce GuardTrace-VL, a vision-aware safety auditor that monitors the full Question-Thinking-Answer (QTA) pipeline via joint image-text analysis, enabling detection of unsafe content as it emerges in the reasoning stage. To support training and evaluation, we construct the GuardTrace dataset, which is generated through diverse prompting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
DloadingX/GuardTrace-VL-3B
model· 55 dl· ♡ 2
55 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Adversarial Robustness in Machine Learning · Topic Modeling