Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

Inferix Team: Tianyu Feng; Yizeng Han; Jiahao He; Yuanyu He; Xi Lin; Teng Liu; Hanfeng Lu; Jiasheng Tang; Wei Wang; Zhiyuan Wang; Jichao Wu; Mingyang Yang; Yinghao Yu; Zeyu Zhang; Bohan Zhuang

arXiv:2511.20714·cs.CV·April 30, 2026

Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation

Inferix Team: Tianyu Feng, Yizeng Han, Jiahao He, Yuanyu He, Xi Lin, Teng Liu, Hanfeng Lu, Jiasheng Tang, Wei Wang, Zhiyuan Wang, Jichao Wu, Mingyang Yang, Yinghao Yu, Zeyu Zhang, Bohan Zhuang

PDF

1 Repo 1 Datasets

TL;DR

Inferix introduces a next-generation inference engine utilizing block-diffusion decoding for efficient, coherent, and high-quality world simulation in video generation, surpassing traditional diffusion models.

Contribution

It presents a novel semi-autoregressive block-diffusion decoding paradigm combined with KV Cache management for improved world simulation and real-time interactive video generation.

Findings

01

Enables more coherent and stable video sequences.

02

Supports real-time interaction and immersive world modeling.

03

Provides a new benchmark for minute-long video generation.

Abstract

World models serve as core simulators for fields such as agentic AI, embodied AI, and gaming, capable of generating long, physically realistic, and interactive high-quality videos. Moreover, scaling these models could unlock emergent capabilities in visual perception, understanding, and reasoning, paving the way for a new paradigm that moves beyond current LLM-centric vision foundation models. A key breakthrough empowering them is the semi-autoregressive (block-diffusion) decoding paradigm, which merges the strengths of diffusion and autoregressive methods by generating video tokens in block-applying diffusion within each block while conditioning on previous ones, resulting in more coherent and stable video sequences. Crucially, it overcomes limitations of standard video diffusion by reintroducing LLM-style KV Cache management, enabling efficient, variable-length, and high-quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba-damo-academy/Inferix
github

Datasets

ZIPLABHuggingface/LVG-Bench
dataset· 2.4k dl
2.4k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.