GTAD: Global Temporal Aggregation Denoising Learning for 3D Semantic Occupancy Prediction
Tianhao Li, Yang Li, Mengtian Li, Yisheng Deng, Weifeng Ge

TL;DR
GTAD introduces a global temporal aggregation denoising network that effectively combines local and global temporal features for improved 3D semantic occupancy prediction in dynamic environments.
Contribution
The paper proposes a novel global temporal aggregation framework with a denoising network for holistic 3D scene understanding, surpassing existing local-only methods.
Findings
Outperforms existing methods on nuScenes and Occ3D-nuScenes benchmarks.
Effectively integrates local and global temporal features for better perception.
Demonstrates the importance of global temporal information in 3D occupancy prediction.
Abstract
Accurately perceiving dynamic environments is a fundamental task for autonomous driving and robotic systems. Existing methods inadequately utilize temporal information, relying mainly on local temporal interactions between adjacent frames and failing to leverage global sequence information effectively. To address this limitation, we investigate how to effectively aggregate global temporal features from temporal sequences, aiming to achieve occupancy representations that efficiently utilize global temporal information from historical observations. For this purpose, we propose a global temporal aggregation denoising network named GTAD, introducing a global temporal information aggregation framework as a new paradigm for holistic 3D scene understanding. Our method employs an in-model latent denoising network to aggregate local temporal features from the current moment and global temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
