Orientation-boosted Voxel Nets for 3D Object Recognition
Nima Sedaghat, Mohammadreza Zolfaghari, Ehsan Amiri, Thomas Brox

TL;DR
This paper introduces a multi-task 3D object recognition approach that incorporates object orientation prediction, significantly improving classification accuracy and speed across various 3D datasets.
Contribution
The paper proposes a novel orientation-boosted voxel network that jointly predicts object class and pose, enhancing recognition performance in 3D data.
Findings
Achieved state-of-the-art classification results on multiple datasets.
Significant improvements in detection precision and speed.
Demonstrated the importance of orientation information in 3D recognition.
Abstract
Recent work has shown good recognition results in 3D object recognition using 3D convolutional networks. In this paper, we show that the object orientation plays an important role in 3D recognition. More specifically, we argue that objects induce different features in the network under rotation. Thus, we approach the category-level classification task as a multi-task problem, in which the network is trained to predict the pose of the object in addition to the class label as a parallel task. We show that this yields significant improvements in the classification results. We test our suggested architecture on several datasets representing various 3D data sources: LiDAR data, CAD models, and RGB-D images. We report state-of-the-art results on classification as well as significant improvements in precision and speed over the baseline on 3D detection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Neural Network Applications · Human Pose and Action Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
