Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks

JuneHyoung Kwon; MiHyeon Kim; Eunju Lee; JungMin Yun; Byeonggeuk Lim; YoungBin Kim

arXiv:2605.03759·cs.CV·May 6, 2026

Before Forgetting, Learn to Remember: Revisiting Foundational Learning Failures in LVLM Unlearning Benchmarks

JuneHyoung Kwon, MiHyeon Kim, Eunju Lee, JungMin Yun, Byeonggeuk Lim, YoungBin Kim

PDF

1 Datasets

TL;DR

This paper introduces ReMem, a new benchmark for assessing foundational memorization in LVLMs, addressing issues of under-memorization and multi-hop reasoning to improve unlearning evaluation reliability.

Contribution

ReMem provides a comprehensive, principled benchmark with data scaling, reasoning-aware QA, and an exposure metric to better diagnose learning and unlearning in LVLMs.

Findings

01

ReMem ensures robust foundational learning in LVLMs.

02

The exposure metric quantifies information erasure effectiveness.

03

Experiments validate ReMem's reliability for diagnosing memorization issues.

Abstract

While Large Vision-Language Models (LVLMs) offer powerful capabilities, they pose privacy risks by unintentionally memorizing sensitive personal information. Current unlearning benchmarks attempt to mitigate this using fictitious identities but overlook a critical stage 1 failure: models fail to effectively memorize target information initially, rendering subsequent unlearning evaluations unreliable. Diagnosing under-memorization and the multi-hop curse as root causes, we introduce ReMem, a Reliable Multi-hop and Multi-image Memorization Benchmark. ReMem ensures robust foundational learning through principled data scaling, reasoning-aware QA pairs, and diverse visual contexts. Additionally, we propose a novel Exposure metric to quantify the depth of information erasure from the model's internal probability distribution. Extensive experiments demonstrate that ReMem provides a rigorous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

herbwood27/Remem
dataset· 108 dl
108 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.