Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Kaijin Chen; Dingkang Liang; Xin Zhou; Yikang Ding; Xiaoqiang Liu; Pengfei Wan; Xiang Bai

arXiv:2603.25716·cs.CV·March 31, 2026

Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models

Kaijin Chen, Dingkang Liang, Xin Zhou, Yikang Ding, Xiaoqiang Liu, Pengfei Wan, Xiang Bai

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces Hybrid Memory and HyDRA, a new approach and dataset for improving dynamic video world models by tracking and maintaining the identity of hidden subjects over time.

Contribution

It presents a novel hybrid memory paradigm, a large-scale dataset HM-World, and a specialized architecture HyDRA for better dynamic subject tracking in video models.

Findings

01

HyDRA outperforms state-of-the-art methods in maintaining subject identity.

02

HM-World enables rigorous evaluation of hybrid memory coherence.

03

The approach improves motion continuity during out-of-view intervals.

Abstract

Video world models have shown immense potential in simulating the physical world, yet existing memory mechanisms primarily treat environments as static canvases. When dynamic subjects hide out of sight and later re-emerge, current methods often struggle, leading to frozen, distorted, or vanishing subjects. To address this, we introduce Hybrid Memory, a novel paradigm requiring models to simultaneously act as precise archivists for static backgrounds and vigilant trackers for dynamic subjects, ensuring motion continuity during out-of-view intervals. To facilitate research in this direction, we construct HM-World, the first large-scale video dataset dedicated to hybrid memory. It features 59K high-fidelity clips with decoupled camera and subject trajectories, encompassing 17 diverse scenes, 49 distinct subjects, and meticulously designed exit-entry events to rigorously evaluate hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

H-EmbodVis/HyDRA
github

Models

🤗
H-EmbodVis/HyDRA
model· 60 dl
60 dl

Datasets

KlingTeam/HM-World
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.