MV-MOS: Multi-View Feature Fusion for 3D Moving Object Segmentation
Jintao Cheng, Xingming Chen, Jinxin Liang, Xiaoyu Tang, Xieyuanli, Chen, Dachuan Li

TL;DR
This paper introduces MV-MOS, a multi-view feature fusion model that combines motion and semantic information from different 2D point cloud representations to improve 3D moving object segmentation in autonomous driving.
Contribution
The paper proposes a novel multi-view fusion approach that integrates BEV and range view features with semantic information using a Mamba module, enhancing segmentation accuracy.
Findings
Outperforms state-of-the-art models on SemanticKITTI benchmark
Effective multi-view fusion improves segmentation accuracy
Semantic features enhance motion segmentation performance
Abstract
Effectively summarizing dense 3D point cloud data and extracting motion information of moving objects (moving object segmentation, MOS) is crucial to autonomous driving and robotics applications. How to effectively utilize motion and semantic features and avoid information loss during 3D-to-2D projection is still a key challenge. In this paper, we propose a novel multi-view MOS model (MV-MOS) by fusing motion-semantic features from different 2D representations of point clouds. To effectively exploit complementary information, the motion branches of the proposed model combines motion features from both bird's eye view (BEV) and range view (RV) representations. In addition, a semantic branch is introduced to provide supplementary semantic features of moving objects. Finally, a Mamba module is utilized to fuse the semantic features with motion features and provide effective guidance for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
