When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition

Xiaokun Sun; Yubo Wang; Haoyu Cao; Linli Xu

arXiv:2603.16256·cs.CV·March 18, 2026

When Thinking Hurts: Mitigating Visual Forgetting in Video Reasoning via Frame Repetition

Xiaokun Sun, Yubo Wang, Haoyu Cao, Linli Xu

PDF

Open Access

TL;DR

This paper introduces FrameRepeat, a framework that improves video reasoning in multimodal models by automatically reinforcing important frames, addressing visual forgetting without extensive retraining.

Contribution

The paper proposes a novel, generalizable method using a lightweight frame scoring network and Add-One-In training strategy to enhance visual input retention in Video-LLMs.

Findings

01

Effective across multiple models and datasets

02

Reduces hallucinations and improves reasoning accuracy

03

Automates frame reinforcement without heavy retraining

Abstract

Recently, Multimodal Large Language Models (MLLMs) have demonstrated significant potential in complex visual tasks through the integration of Chain-of-Thought (CoT) reasoning. However, in Video Question Answering, extended thinking processes do not consistently yield performance gains and may even lead to degradation due to ``visual anchor drifting'', where models increasingly rely on self-generated text, sidelining visual inputs and causing hallucinations. While existing mitigations typically introduce specific mechanisms for the model to re-attend to visual inputs during inference, these approaches often incur prohibitive training costs and suffer from poor generalizability across different architectures. To address this, we propose FrameRepeat, an automated enhancement framework which features a lightweight repeat scoring module that enables Video-LLMs to autonomously identify which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)