SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety

Zixuan Xu; Tiancheng He; Huahui Yi; Kun Wang; Xi Chen; Gongli Xi; Qiankun Li; Kang Li; Yang Liu; Zhigang Zeng

arXiv:2603.02635·cs.LG·March 4, 2026

SaFeR-ToolKit: Structured Reasoning via Virtual Tool Calling for Multimodal Safety

Zixuan Xu, Tiancheng He, Huahui Yi, Kun Wang, Xi Chen, Gongli Xi, Qiankun Li, Kang Li, Yang Liu, Zhigang Zeng

PDF

Open Access

TL;DR

SaFeR-ToolKit introduces a structured, tool-based safety reasoning framework for multimodal vision-language models, significantly enhancing safety and reasoning rigor while maintaining general capabilities.

Contribution

It formalizes safety decision-making as a checkable protocol, introduces a new dataset, and demonstrates substantial safety improvements on Qwen2.5-VL models.

Findings

01

Significant safety and helpfulness improvements on 3B and 7B models.

02

Maintains core capabilities despite safety enhancements.

03

First tool-based safety reasoning dataset with over 31,000 examples.

Abstract

Vision-language models remain susceptible to multimodal jailbreaks and over-refusal because safety hinges on both visual evidence and user intent, while many alignment pipelines supervise only the final response. To address this, we present SaFeR-ToolKit, which formalizes safety decision-making as a checkable protocol. Concretely, a planner specifies a persona, a Perception $\to$ Reasoning $\to$ Decision tool set, and a constrained transition graph, while a responder outputs a typed key-value tool trace before the final answer. To make the protocol reliably followed in practice, we train a single policy with a three-stage curriculum (SFT $\to$ DPO $\to$ GRPO), where GRPO directly supervises tool usage beyond answer-level feedback. Our contributions are two-fold: I. Dataset. The first tool-based safety reasoning dataset, comprising 31,654 examples (SFT 6k, DPO 18.6k, GRPO 6k) plus 1k…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Ethics and Social Impacts of AI · Human-Automation Interaction and Safety