ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions

Dubing Chen; Jin Fang; Wencheng Han; Xinjing Cheng; Junbo Yin; Chenzhong Xu; Fahad Shahbaz Khan; Jianbing Shen

arXiv:2411.07725·cs.CV·September 11, 2025

ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions

Dubing Chen, Jin Fang, Wencheng Han, Xinjing Cheng, Junbo Yin, Chenzhong Xu, Fahad Shahbaz Khan, Jianbing Shen

PDF

Open Access 1 Repo

TL;DR

This paper introduces ALOcc, a novel framework for 3D semantic occupancy and flow prediction that improves robustness, consistency, and efficiency through adaptive lifting, semantic alignment, and BEV-centric cost volumes, achieving state-of-the-art results.

Contribution

The paper presents a new adaptive lifting mechanism, semantic consistency enforcement, and a BEV-centric cost volume for joint 3D semantic occupancy and flow prediction, with real-time capabilities.

Findings

01

Achieves state-of-the-art accuracy on multiple benchmarks.

02

Outperforms existing real-time methods in speed and accuracy.

03

Provides a spectrum of models balancing efficiency and performance.

Abstract

3D semantic occupancy and flow prediction are fundamental to spatiotemporal scene understanding. This paper proposes a vision-based framework with three targeted improvements. First, we introduce an occlusion-aware adaptive lifting mechanism incorporating depth denoising. This enhances the robustness of 2D-to-3D feature transformation while mitigating reliance on depth priors. Second, we enforce 3D-2D semantic consistency via jointly optimized prototypes, using confidence- and category-aware sampling to address the long-tail classes problem. Third, to streamline joint prediction, we devise a BEV-centric cost volume to explicitly correlate semantic and flow features, supervised by a hybrid classification-regression scheme that handles diverse motion scales. Our purely convolutional architecture establishes new SOTA performance on multiple benchmarks for both semantic occupancy and joint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cdb342/alocc
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Traffic control and management · Traffic Prediction and Management Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings