SWA-SOP: Spatially-aware Window Attention for Semantic Occupancy Prediction in Autonomous Driving

Helin Cao; Rafael Materla; and Sven Behnke

arXiv:2506.18785·cs.CV·August 14, 2025

SWA-SOP: Spatially-aware Window Attention for Semantic Occupancy Prediction in Autonomous Driving

Helin Cao, Rafael Materla, and Sven Behnke

PDF

TL;DR

This paper introduces SWA-SOP, a novel spatially-aware window attention mechanism that enhances semantic occupancy prediction in autonomous driving by incorporating local spatial context, leading to improved scene completion across LiDAR and camera data.

Contribution

The paper proposes SWA, a new attention mechanism that explicitly models spatial structure, significantly improving SOP performance in sparse and occluded environments.

Findings

01

Achieves state-of-the-art results on LiDAR SOP benchmarks.

02

Improves scene completion in sparse and occluded regions.

03

Provides consistent gains across LiDAR and camera modalities.

Abstract

Perception systems in autonomous driving rely on sensors such as LiDAR and cameras to perceive the 3D environment. However, due to occlusions and data sparsity, these sensors often fail to capture complete information. Semantic Occupancy Prediction (SOP) addresses this challenge by inferring both occupancy and semantics of unobserved regions. Existing transformer-based SOP methods lack explicit modeling of spatial structure in attention computation, resulting in limited geometric awareness and poor performance in sparse or occluded areas. To this end, we propose Spatially-aware Window Attention (SWA), a novel mechanism that incorporates local spatial context into attention. SWA significantly improves scene completion and achieves state-of-the-art results on LiDAR-based SOP benchmarks. We further validate its generality by integrating SWA into a camera-based SOP pipeline, where it also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.