SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

Haolong Hu; Hanyu Li; Tiancheng He; Huahui Yi; An Zhang; Qiankun Li; Kun Wang; Yang Liu; Zhigang Zeng

arXiv:2604.16358·cs.LG·April 21, 2026

SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

Haolong Hu, Hanyu Li, Tiancheng He, Huahui Yi, An Zhang, Qiankun Li, Kun Wang, Yang Liu, Zhigang Zeng

PDF

1 Repo

TL;DR

SaFeR-Steer is a multi-turn safety alignment framework for multimodal large language models that uses synthetic bootstrapping and feedback dynamics to improve safety and helpfulness in multi-turn interactions.

Contribution

It introduces a novel multi-turn safety alignment method combining synthetic bootstrapping, on-policy attacks, and a new safety propagation metric, along with a multimodal safety dataset.

Findings

01

Significant safety and helpfulness improvements on benchmarks.

02

Shifted safety failures to later turns, enhancing robustness.

03

Provided a new multimodal safety dataset for multi-turn dialogues.

Abstract

MLLMs are increasingly deployed in multi-turn settings, where attackers can escalate unsafe intent through the evolving visual-text history and exploit long-context safety decay. Yet safety alignment is still dominated by single-turn data and fixed-template dialogues, leaving a mismatch between training and deployment.To bridge this gap, we propose SaFeR-Steer, a progressive multi-turn alignment framework that combines staged synthetic bootstrapping with tutor-in-the-loop GRPO to train a single student under adaptive, on-policy attacks. We also introduce TCSR, which uses trajectory minimum/average safety to propagate late-turn failures to earlier turns.I. Dataset. We release STEER, a multi-turn multimodal safety dataset with STEER-SFT (12,934), STEER-RL (2,000), and STEER-Bench (3,227) dialogues spanning 2~10 turns.II. Experiment. Starting from Qwen2.5-VL-3B/7B, SaFeR-Steer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ed-Bg/SaFeR-Steer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.