FMOcc: TPV-Driven Flow Matching for 3D Occupancy Prediction with Selective State Space Model
Jiangxia Chen, Tongyuan Huang, Ke Song

TL;DR
FMOcc introduces a flow matching-based selective state space model for few-frame 3D occupancy prediction, improving accuracy and efficiency in occluded and distant scenes for autonomous driving.
Contribution
The paper proposes a novel TPV refinement network with flow matching SSM modules and plane selective filtering, reducing computational load and enhancing distant scene prediction.
Findings
Outperforms existing methods on Occ3D-nuScenes and OpenOcc datasets.
Achieves 43.1% RayIoU and 39.8% mIoU with two frames on Occ3D-nuScenes.
Operates with 5.4 G inference memory and 330ms inference time.
Abstract
3D semantic occupancy prediction plays a pivotal role in autonomous driving. However, inherent limitations of fewframe images and redundancy in 3D space compromise prediction accuracy for occluded and distant scenes. Existing methods enhance performance by fusing historical frame data, which need additional data and significant computational resources. To address these issues, this paper propose FMOcc, a Tri-perspective View (TPV) refinement occupancy network with flow matching selective state space model for few-frame 3D occupancy prediction. Firstly, to generate missing features, we designed a feature refinement module based on a flow matching model, which is called Flow Matching SSM module (FMSSM). Furthermore, by designing the TPV SSM layer and Plane Selective SSM (PS3M), we selectively filter TPV features to reduce the impact of air voxels on non-air voxels, thereby enhancing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
