Quantifying Cross-Modality Memorization in Vision-Language Models
Yuxin Wen, Yangsibo Huang, Tom Goldstein, Ravi Kumar, Badih Ghazi, Chiyuan Zhang

TL;DR
This paper systematically investigates cross-modality memorization in vision-language models, revealing transferability of facts between modalities and identifying gaps, with implications for improving multimodal learning robustness.
Contribution
It introduces a synthetic dataset for controlled experiments and analyzes how knowledge transfers across modalities in vision-language models, highlighting existing gaps and proposing mitigation methods.
Findings
Facts learned in one modality transfer to the other.
Significant gap exists between source and target modality recall.
Transferability gap persists across various scenarios.
Abstract
Understanding what and how neural networks memorize during training is crucial, both from the perspective of unintentional memorization of potentially sensitive information and from the standpoint of effective knowledge acquisition for real-world, knowledge-intensive tasks. While previous studies primarily investigate memorization within a single modality, such as text memorization in large language models or image memorization in diffusion models, unified multimodal models are becoming increasingly prevalent in practical applications. In this work, we focus on the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models. To facilitate controlled experiments, we first introduce a synthetic persona dataset comprising diverse synthetic person images and textual descriptions. We quantify factual knowledge memorization and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Text Readability and Simplification
MethodsDiffusion · Focus
