ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions
Dubing Chen, Jin Fang, Wencheng Han, Xinjing Cheng, Junbo Yin, Chenzhong Xu, Fahad Shahbaz Khan, Jianbing Shen

TL;DR
This paper introduces ALOcc, a novel framework for 3D semantic occupancy and flow prediction that improves robustness, consistency, and efficiency through adaptive lifting, semantic alignment, and BEV-centric cost volumes, achieving state-of-the-art results.
Contribution
The paper presents a new adaptive lifting mechanism, semantic consistency enforcement, and a BEV-centric cost volume for joint 3D semantic occupancy and flow prediction, with real-time capabilities.
Findings
Achieves state-of-the-art accuracy on multiple benchmarks.
Outperforms existing real-time methods in speed and accuracy.
Provides a spectrum of models balancing efficiency and performance.
Abstract
3D semantic occupancy and flow prediction are fundamental to spatiotemporal scene understanding. This paper proposes a vision-based framework with three targeted improvements. First, we introduce an occlusion-aware adaptive lifting mechanism incorporating depth denoising. This enhances the robustness of 2D-to-3D feature transformation while mitigating reliance on depth priors. Second, we enforce 3D-2D semantic consistency via jointly optimized prototypes, using confidence- and category-aware sampling to address the long-tail classes problem. Third, to streamline joint prediction, we devise a BEV-centric cost volume to explicitly correlate semantic and flow features, supervised by a hybrid classification-regression scheme that handles diverse motion scales. Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Traffic control and management · Traffic Prediction and Management Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
