OccSTeP: Benchmarking 4D Occupancy Spatio-Temporal Persistence

Yu Zheng; Jie Hu; Kailun Yang; Jiaming Zhang

arXiv:2512.15621·cs.CV·December 18, 2025

OccSTeP: Benchmarking 4D Occupancy Spatio-Temporal Persistence

Yu Zheng, Jie Hu, Kailun Yang, Jiaming Zhang

PDF

Open Access

TL;DR

This paper introduces OccSTeP, a new benchmark and world model for 4D occupancy spatio-temporal persistence in autonomous driving, enabling robust scene understanding and forecasting despite noisy or missing data.

Contribution

It presents the first OccSTeP benchmark with challenging scenarios and proposes OccSTeP-WM, a novel dense voxel-based world model with linear-complexity attention and recurrent modules.

Findings

01

Achieved 23.70% semantic mIoU with a 6.56% improvement.

02

Achieved 35.89% occupancy IoU with a 9.26% improvement.

03

Demonstrated robustness in online inference with noisy or missing data.

Abstract

Autonomous driving requires a persistent understanding of 3D scenes that is robust to temporal disturbances and accounts for potential future actions. We introduce a new concept of 4D Occupancy Spatio-Temporal Persistence (OccSTeP), which aims to address two tasks: (1) reactive forecasting: ''what will happen next'' and (2) proactive forecasting: "what would happen given a specific future action". For the first time, we create a new OccSTeP benchmark with challenging scenarios (e.g., erroneous semantic labels and dropped frames). To address this task, we propose OccSTeP-WM, a tokenizer-free world model that maintains a dense voxel-based scene state and incrementally fuses spatio-temporal context over time. OccSTeP-WM leverages a linear-complexity attention backbone and a recurrent state-space module to capture long-range spatial dependencies while continually updating the scene memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Generative Adversarial Networks and Image Synthesis · Robotics and Sensor-Based Localization