HERO: Hierarchical Extrapolation and Refresh for Efficient World Models
Quanjian Song, Xinyu Wang, Donghao Zhou, Jingyu Lin, Cunjian Chen, Yue Ma

TL;DR
HERO is a training-free hierarchical framework that accelerates diffusion-based world models by combining patch-wise refresh in shallow layers and linear extrapolation in deep layers, achieving 1.73x speedup with minimal quality loss.
Contribution
HERO introduces a novel hierarchical acceleration method for diffusion world models, combining patch-wise refresh and linear extrapolation without additional training.
Findings
Achieves 1.73× inference speedup with minimal quality degradation.
Outperforms existing diffusion acceleration methods.
Effectively leverages feature coupling phenomenon in world models.
Abstract
Generation-driven world models create immersive virtual environments but suffer slow inference due to the iterative nature of diffusion models. While recent advances have improved diffusion model efficiency, directly applying these techniques to world models introduces limitations such as quality degradation. In this paper, we present HERO, a training-free hierarchical acceleration framework tailored for efficient world models. Owing to the multi-modal nature of world models, we identify a feature coupling phenomenon, wherein shallow layers exhibit high temporal variability, while deeper layers yield more stable feature representations. Motivated by this, HERO adopts hierarchical strategies to accelerate inference: (i) In shallow layers, a patch-wise refresh mechanism efficiently selects tokens for recomputation. With patch-wise sampling and frequency-aware tracking, it avoids extra…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
