Find, Fix, Reason: Context Repair for Video Reasoning

Haojian Huang; Chuanyu Qin; Yinchuan Li; Yingcong Chen

arXiv:2604.16243·cs.CV·May 4, 2026

Find, Fix, Reason: Context Repair for Video Reasoning

Haojian Huang, Chuanyu Qin, Yinchuan Li, Yingcong Chen

PDF

1 Repo

TL;DR

This paper introduces a context repair method for video reasoning that leverages larger models as tools to identify and supply missing evidence, improving accuracy and generalization in multi-modal video understanding tasks.

Contribution

It proposes a novel observation-level intervention using a frozen teacher model and a new reward to enhance training, outperforming existing methods in video reasoning benchmarks.

Findings

01

Consistent accuracy improvements across multiple benchmarks.

02

Enhanced generalization capabilities in video reasoning tasks.

03

Effective use of larger models as tools for context repair.

Abstract

Reinforcement learning has advanced video reasoning in large multi-modal models, yet dominant pipelines either rely on on-policy self-exploration, which plateaus at the model's knowledge boundary, or hybrid replay that mixes policies and demands careful regularization. Dynamic context methods zoom into focused evidence but often require curated pretraining and two-stage tuning, and their context remains bounded by a small model's capability. In contrast, larger models excel at instruction following and multi-modal understanding, can supply richer context to smaller models, and rapidly zoom in on target regions via simple tools. Building on this capability, we introduce an observation-level intervention: a frozen, tool-integrated teacher identifies the missing spatiotemporal dependency and provides a minimal evidence patch (e.g., timestamps, regions etc.) from the original video while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://jethrojames.github.io/FFR
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.