GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction

Tianhao Li; Yang Li; Mengtian Li; Yisheng Deng; Weifeng Ge

arXiv:2507.20963·cs.CV·July 29, 2025

GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction

Tianhao Li, Yang Li, Mengtian Li, Yisheng Deng, Weifeng Ge

PDF

Open Access

TL;DR

GTAD introduces a global temporal aggregation denoising network that effectively combines local and global temporal features for improved 3D semantic occupancy prediction in dynamic environments.

Contribution

The paper proposes a novel global temporal aggregation framework with a denoising network for holistic 3D scene understanding, surpassing existing local-only methods.

Findings

01

Outperforms existing methods on nuScenes and Occ3D-nuScenes benchmarks.

02

Effectively integrates local and global temporal features for better perception.

03

Demonstrates the importance of global temporal information in 3D occupancy prediction.

Abstract

Accurately perceiving dynamic environments is a fundamental task for autonomous driving and robotic systems. Existing methods inadequately utilize temporal information, relying mainly on local temporal interactions between adjacent frames and failing to leverage global sequence information effectively. To address this limitation, we investigate how to effectively aggregate global temporal features from temporal sequences, aiming to achieve occupancy representations that efficiently utilize global temporal information from historical observations. For this purpose, we propose a global temporal aggregation denoising network named GTAD, introducing a global temporal information aggregation framework as a new paradigm for holistic 3D scene understanding. Our method employs an in-model latent denoising network to aggregate local temporal features from the current moment and global temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis