Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

Siyuan Huang; Xiaoye Qu; Yafu Li; Tong Zhu; Zefeng He; Muxin Fu; Daizong Liu; Wei-Long Zheng; Yu Cheng

arXiv:2605.00814·cs.CV·May 11, 2026

Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs

Siyuan Huang, Xiaoye Qu, Yafu Li, Tong Zhu, Zefeng He, Muxin Fu, Daizong Liu, Wei-Long Zheng, Yu Cheng

PDF

1 Repo

TL;DR

This paper introduces Persistent Visual Memory (PVM), a lightweight module that enhances sustained visual perception in LVLMs, improving accuracy and robustness in complex reasoning tasks with minimal overhead.

Contribution

The paper proposes PVM, a novel parallel module for LVLMs that mitigates visual signal decay and improves visual perception during deep generation.

Findings

01

PVM improves accuracy across 4B and 8B LVLMs.

02

PVM enhances robustness in longer generations.

03

PVM accelerates internal prediction convergence.

Abstract

While autoregressive Large Vision-Language Models (LVLMs) demonstrate remarkable proficiency in multimodal tasks, they face a "Visual Signal Dilution" phenomenon, where the accumulation of textual history expands the attention partition function, causing visual attention to decay inversely with generated sequence length. To counteract this, we propose Persistent Visual Memory (PVM), a lightweight learnable module designed to strengthen sustained, on-demand access to visual evidence. Integrated as a parallel branch alongside the Feed-Forward Network (FFN) in LVLMs, PVM establishes a distance-agnostic retrieval pathway that directly provides visual embeddings for enhanced visual perception, thereby structurally mitigating the signal suppression inherent to deep generation. Extensive experiments on Qwen3-VL models demonstrate that PVM brings notable improvements with negligible parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huaixuheqing/PVM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.