Occupancy Learning with Spatiotemporal Memory

Ziyang Leng; Jiawei Yang; Wenlong Yi; Bolei Zhou

arXiv:2508.04705·cs.CV·August 7, 2025

Occupancy Learning with Spatiotemporal Memory

Ziyang Leng, Jiawei Yang, Wenlong Yi, Bolei Zhou

PDF

TL;DR

This paper introduces ST-Occ, a novel framework for 3D occupancy learning in autonomous driving that efficiently captures spatiotemporal information using a scene-level memory and attention mechanism, improving accuracy and consistency.

Contribution

The paper proposes a scene-level spatiotemporal memory and attention mechanism to enhance 3D occupancy prediction with temporal consistency and efficiency.

Findings

01

Outperforms state-of-the-art methods by 3 mIoU.

02

Reduces temporal inconsistency by 29%.

03

Effectively models historical information for better perception.

Abstract

3D occupancy becomes a promising perception representation for autonomous driving to model the surrounding environment at a fine-grained scale. However, it remains challenging to efficiently aggregate 3D occupancy over time across multiple input frames due to the high processing cost and the uncertainty and dynamics of voxels. To address this issue, we propose ST-Occ, a scene-level occupancy representation learning framework that effectively learns the spatiotemporal feature with temporal consistency. ST-Occ consists of two core designs: a spatiotemporal memory that captures comprehensive historical information and stores it efficiently through a scene-level representation and a memory attention that conditions the current occupancy representation on the spatiotemporal memory with a model of uncertainty and dynamic awareness. Our method significantly enhances the spatiotemporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.