Video Unlearning via Low-Rank Refusal Vector
Simone Facchiano, Stefano Saravalle, Matteo Migliarini, Edoardo De Matteis, Alessio Sampieri, Andrea Pilzer, Emanuele Rodol\`a, Indro Spinelli, Luca Franco, Fabio Galasso

TL;DR
This paper introduces a training-free, weight-update method for removing unsafe concepts from video diffusion models, significantly reducing harmful content generation while maintaining quality and prompt alignment.
Contribution
It presents the first closed-form, training-free framework for concept removal in video diffusion models using a low-rank refusal vector and contrastive factorization.
Findings
Reduces unsafe video generations by 36.3% and 58.2% on benchmarks.
Maintains video quality and prompt alignment after concept removal.
Operates without retraining or inference overhead.
Abstract
Video generative models achieve high-quality synthesis from natural-language prompts by leveraging large-scale web data. However, this training paradigm inherently exposes them to unsafe biases and harmful concepts, introducing the risk of generating undesirable or illicit content. To mitigate unsafe generations, existing machine unlearning approaches either rely on filtering, and can therefore be bypassed, or they update model weights, but with costly fine-tuning or training-free closed-form edits. We propose the first training-free weight update framework for concept removal in video diffusion models. From five paired safe/unsafe prompts, our method estimates a refusal vector and integrates it into the model weights as a closed-form update. A contrastive low-rank factorization further disentangles the target concept from unrelated semantics, it ensures a selective concept suppression…
Peer Reviews
Decision·ICLR 2026 Poster
- The work addresses an important and timely problem—ensuring the safety of large generative models without costly retraining or compromising utility. - Unlike prior approaches that rely on retraining, reinforcement learning, or prompt engineering, the proposed Refusal Vector method directly identifies and suppresses unsafe directions in weight space using only a small number of safe/unsafe prompt pairs. - The study’s evaluation on multiple unsafe categories and benchmarks provides solid evidenc
- The paper has limtied novelty - The concept of refusal vectors [1], low rank updates to parameters/representations [2] has been introduced in previous works before but not for the video domain. So the its novelty is probably only the video domain - The paper primarily reports a censorship rate metric and some qualitative visualizations. However, there is no systematic evaluation of how much the method preserves prompt alignment or perceptual quality. Without these measures, it is hard to verif
1. The method is simple and fast to apply 2. Clear safety framing with concrete categories (copyright/tm, public figures, etc.) and ablations on rank/regularization. 3. quantitative results (FVD, MM-Notox) suggest limited quality drop on chosen backbones
1. The paper evaluates on OPEN-SORA and ZeroScopeT2V only and compares mainly to SAFREE (filtering) and NullSCE (fine-tuning). That omits several strong, contemporary T2V backbone, like Wan series and Hunyuan series. 2. Safety measurement is narrow; key semantic-fidelity metrics are missing. I think more prompt-faithfulness semantic metrics (e.g., CLIP-text/video alignment, TIFA) to ensure you aren’t quietly degrading non-safety semantics that are not captured by FVD/MM-Notox. 3. Heavy relian
1. Efficiency: While training-free methods are well established in T2I unlearning, there was a timely need for such methods in T2V. The authors propose a simple and efficient algorithm to accomplish this task. 2. Strong Analysis: Rather than just proposing a new method, the authors provide strong ablations and hyperparameter sensitivity studies. This included the choice of rank, layers, prompt pairs, etc. These ablations provide a transparent view into the practicality of their proposed method.
1. Uniqueness to T2V: On line 119, the authors note that existing training-free unlearning methods developed for T2I generation are difficult to adapt to T2V models because of differences in text encoders and frame-independent architectures. However, after reading the method section, it remains unclear how the proposed approach is specifically tailored to the video setting. The described procedure of computing activation differences and applying PCA appears applicable to T2I models as well. This
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsDiffusion
