VideoAtlas: Navigating Long-Form Video in Logarithmic Compute

Mohamed Eltahir; Ali Habibullah; Yazan Alshoibi; Lama Ayash; Tanveer Hussain; Naeemullah Khan

arXiv:2603.17948·cs.CV·March 19, 2026

VideoAtlas: Navigating Long-Form Video in Logarithmic Compute

Mohamed Eltahir, Ali Habibullah, Yazan Alshoibi, Lama Ayash, Tanveer Hussain, Naeemullah Khan

PDF

Open Access

TL;DR

VideoAtlas introduces a hierarchical, lossless environment for long-form video navigation, enabling scalable, logarithmic compute growth and robust understanding across extended durations using recursive language models.

Contribution

The paper presents VideoAtlas, a novel environment that structures video as a hierarchical grid, facilitating lossless, scalable navigation and enabling the extension of recursive language models to video understanding.

Findings

01

Logarithmic compute growth with video duration.

02

30-60% multimodal cache hit rate from grid reuse.

03

Robust performance on 1-hour to 10-hour video benchmarks.

Abstract

Extending language models to video introduces two challenges: representation, where existing methods rely on lossy approximations, and long-context, where caption- or agent-based pipelines collapse video into text and lose visual fidelity. To overcome this, we introduce \textbf{VideoAtlas}, a task-agnostic environment to represent video as a hierarchical grid that is simultaneously lossless, navigable, scalable, caption- and preprocessing-free. An overview of the video is available at a glance, and any region can be recursively zoomed into, with the same visual representation used uniformly for the video, intermediate investigations, and the agent's memory, eliminating lossy text conversion end-to-end. This hierarchical structure ensures access depth grows only logarithmically with video length. For long-context, Recursive Language Models (RLMs) recently offered a powerful solution for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)