Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory

Ruiqi Wu; Xuanhua He; Meng Cheng; Tianyu Yang; Yong Zhang; Zhuoliang Kang; Xunliang Cai; Xiaoming Wei; Chunle Guo; Chongyi Li; Ming-Ming Cheng

arXiv:2602.02393·cs.CV·February 4, 2026

Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory

Ruiqi Wu, Xuanhua He, Meng Cheng, Tianyu Yang, Yong Zhang, Zhuoliang Kang, Xunliang Cai, Xiaoming Wei, Chunle Guo, Chongyi Li, Ming-Ming Cheng

PDF

Open Access 1 Models

TL;DR

Infinite-World introduces a hierarchical, pose-free memory system that enables interactive world models to maintain coherent visual memory over 1000+ frames in complex environments, overcoming challenges of noisy pose estimations and viewpoint revisits.

Contribution

The paper presents a novel Hierarchical Pose-free Memory Compressor and an Uncertainty-aware Action Labeling module, advancing long-range visual memory and robust action learning in real-world videos.

Findings

01

Achieves over 1000-frame coherent visual memory in real-world environments.

02

Outperforms existing models in visual quality and action controllability.

03

Demonstrates effective long-range loop-closure with minimal fine-tuning.

Abstract

We propose Infinite-World, a robust interactive world model capable of maintaining coherent visual memory over 1000+ frames in complex real-world environments. While existing world models can be efficiently optimized on synthetic data with perfect ground-truth, they lack an effective training paradigm for real-world videos due to noisy pose estimations and the scarcity of viewpoint revisits. To bridge this gap, we first introduce a Hierarchical Pose-free Memory Compressor (HPMC) that recursively distills historical latents into a fixed-budget representation. By jointly optimizing the compressor with the generative backbone, HPMC enables the model to autonomously anchor generations in the distant past with bounded computational cost, eliminating the need for explicit geometric priors. Second, we propose an Uncertainty-aware Action Labeling module that discretizes continuous motion into a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
MeiGen-AI/Infinite-World
model· ♡ 4
♡ 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation