Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models

Yue Li; Xin Yi; Dongsheng Shi; Gerard de Melo; Xiaoling Wang; Linlin Wang

arXiv:2505.16104·cs.CL·July 23, 2025

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models

Yue Li, Xin Yi, Dongsheng Shi, Gerard de Melo, Xiaoling Wang, Linlin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Hierarchical Safety Realignment, a lightweight method to restore safety in pruned large vision-language models by selectively restoring neurons in critical attention heads, improving safety performance after pruning.

Contribution

The paper proposes a novel hierarchical safety realignment technique that identifies and restores critical neurons within attention heads to recover safety in pruned LVLMs.

Findings

01

HSR significantly improves safety metrics across models

02

Selective neuron restoration outperforms baseline pruning methods

03

First approach explicitly targeting safety restoration in pruned LVLMs

Abstract

With the increasing size of Large Vision-Language Models (LVLMs), network pruning techniques aimed at compressing models for deployment in resource-constrained environments have garnered significant attention. However, we observe that pruning often leads to a degradation in safety performance. To address this issue, we present a novel and lightweight approach, termed Hierarchical Safety Realignment (HSR). HSR operates by first quantifying the contribution of each attention head to safety, identifying the most critical ones, and then selectively restoring neurons directly within these attention heads that play a pivotal role in maintaining safety. This process hierarchically realigns the safety of pruned LVLMs, progressing from the attention head level to the neuron level. We validate HSR across various models and pruning strategies, consistently achieving notable improvements in safety…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

theshineyue/hsr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsSoftmax · Attention Is All You Need · Pruning