Towards Deeper and Better Multi-view Feature Fusion for 3D Semantic Segmentation
Chaolong Yang, Yuyao Yan, Weiguang Zhao, Jianan Ye, Xi Yang, Amir, Hussain, Kaizhu Huang

TL;DR
This paper proposes a unidirectional multi-view 2D feature projection method into 3D space, enhancing feature fusion and improving 3D semantic segmentation performance.
Contribution
It introduces a flexible unidirectional projection approach for better cross-dimensional feature fusion, overcoming limitations of bidirectional methods.
Findings
Achieves superior performance on ScanNetv2 benchmark.
Enables deeper and more flexible feature fusion.
Reduces overfitting by focusing on core 3D segmentation task.
Abstract
3D point clouds are rich in geometric structure information, while 2D images contain important and continuous texture information. Combining 2D information to achieve better 3D semantic segmentation has become mainstream in 3D scene understanding. Albeit the success, it still remains elusive how to fuse and process the cross-dimensional features from these two distinct spaces. Existing state-of-the-art usually exploit bidirectional projection methods to align the cross-dimensional features and realize both 2D & 3D semantic segmentation tasks. However, to enable bidirectional mapping, this framework often requires a symmetrical 2D-3D network structure, thus limiting the network's flexibility. Meanwhile, such dual-task settings may distract the network easily and lead to over-fitting in the 3D segmentation task. As limited by the network's inflexibility, fused features can only pass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Vision and Imaging
MethodsALIGN
