Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models
Yue Li, Xin Yi, Dongsheng Shi, Gerard de Melo, Xiaoling Wang, Linlin Wang

TL;DR
This paper introduces Hierarchical Safety Realignment, a lightweight method to restore safety in pruned large vision-language models by selectively restoring neurons in critical attention heads, improving safety performance after pruning.
Contribution
The paper proposes a novel hierarchical safety realignment technique that identifies and restores critical neurons within attention heads to recover safety in pruned LVLMs.
Findings
HSR significantly improves safety metrics across models
Selective neuron restoration outperforms baseline pruning methods
First approach explicitly targeting safety restoration in pruned LVLMs
Abstract
With the increasing size of Large Vision-Language Models (LVLMs), network pruning techniques aimed at compressing models for deployment in resource-constrained environments have garnered significant attention. However, we observe that pruning often leads to a degradation in safety performance. To address this issue, we present a novel and lightweight approach, termed Hierarchical Safety Realignment (HSR). HSR operates by first quantifying the contribution of each attention head to safety, identifying the most critical ones, and then selectively restoring neurons directly within these attention heads that play a pivotal role in maintaining safety. This process hierarchically realigns the safety of pruned LVLMs, progressing from the attention head level to the neuron level. We validate HSR across various models and pruning strategies, consistently achieving notable improvements in safety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
MethodsSoftmax · Attention Is All You Need · Pruning
