OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao,, Zhuotao Tian, Jiaya Jia

TL;DR
This paper introduces OA-CNNs, a family of sparse 3D CNNs with adaptive components that outperform point transformers in semantic segmentation tasks while being more efficient and less resource-intensive.
Contribution
The paper proposes adaptive receptive fields and relation modules to significantly enhance sparse CNNs, enabling them to surpass transformer models in accuracy and efficiency.
Findings
OA-CNNs achieve higher accuracy than point transformers on multiple benchmarks.
OA-CNNs run at up to 5 times faster with less memory usage.
The adaptive modules improve sparse CNN performance without self-attention.
Abstract
The booming of 3D recognition in the 2020s began with the introduction of point cloud transformers. They quickly overwhelmed sparse CNNs and became state-of-the-art models, especially in 3D semantic segmentation. However, sparse CNNs are still valuable networks, due to their efficiency treasure, and ease of application. In this work, we reexamine the design distinctions and test the limits of what a sparse CNN can achieve. We discover that the key credit to the performance difference is adaptivity. Specifically, we propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap. This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module to greatly enhance the adaptivity of sparse CNNs at minimal computational cost. Without any self-attention modules, OA-CNNs favorably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Image Processing and 3D Reconstruction · 3D Shape Modeling and Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
