Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap
Wenhan Yang, Spencer Stice, Ali Payani, Baharan Mirzasoleiman

TL;DR
This paper identifies the modality gap between image and text representations as a key factor in LVLM safety issues and proposes a pretraining regularization method to reduce this gap, significantly improving safety without harming performance.
Contribution
The study reveals the correlation between modality gap and safety degradation in LVLMs and introduces a novel regularization technique during pretraining to mitigate this gap.
Findings
Reducing modality gap improves LVLM safety by up to 16.3%.
The method enhances existing safety defenses by up to 18.2%.
Modality gap persists through fine-tuning, highlighting the importance of pretraining adjustments.
Abstract
Ensuring Vision-Language Models (VLMs) generate safe outputs is crucial for their reliable deployment. However, LVLMs suffer from drastic safety degradation compared to their LLM backbone. Even blank or irrelevant images can trigger LVLMs to generate harmful responses to prompts that would otherwise be refused in text-only contexts. The modality gap between image and text representations has been recently hypothesized to contribute to safety degradation of LVLMs. However, if and how the amount of modality gap affects LVLMs' safety is not studied. In this work, we show that the amount of modality gap is highly inversely correlated with VLMs' safety. Then, we show that this modality gap is introduced during pretraining LVLMs and persists through fine-tuning. Inspired by this observation, we propose a regularization to reduce the modality gap during pretraining. Our extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower Transformer Diagnostics and Insulation · Infrastructure Maintenance and Monitoring · Industrial Vision Systems and Defect Detection
