$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation

Saul Santos; Ant\'onio Farinhas; Daniel C. McNamee; Andr\'e F. T. Martins

arXiv:2501.19098·cs.CV·May 20, 2025

$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation

Saul Santos, Ant\'onio Farinhas, Daniel C. McNamee, Andr\'e F. T. Martins

PDF

Open Access 1 Repo 1 Video

TL;DR

$\u2200$-Video introduces a training-free, continuous-time memory system that enables scalable understanding of arbitrarily long videos, improving long-video comprehension without additional training.

Contribution

The paper presents a novel continuous-time long-term memory mechanism that allows processing of unbounded videos efficiently without extra training, enhancing long-video understanding.

Findings

01

Improved performance on video question-answering tasks.

02

Efficient processing of arbitrarily long videos.

03

No additional training required for long-video comprehension.

Abstract

Current video-language models struggle with long-video understanding due to limited context lengths and reliance on sparse frame subsampling, often leading to information loss. This paper introduces $\infty$ -Video, which can process arbitrarily long videos through a continuous-time long-term memory (LTM) consolidation mechanism. Our framework augments video Q-formers by allowing them to process unbounded video contexts efficiently and without requiring additional training. Through continuous attention, our approach dynamically allocates higher granularity to the most relevant video segments, forming "sticky" memories that evolve over time. Experiments with Video-LLaMA and VideoChat2 demonstrate improved performance in video question-answering tasks, showcasing the potential of continuous-time LTM mechanisms to enable scalable and training-free comprehension of long videos.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deep-spin/infinite-video
pytorchOfficial

Videos

$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation· slideslive

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition · Advanced Image Processing Techniques