Orientation-boosted Voxel Nets for 3D Object Recognition

Nima Sedaghat; Mohammadreza Zolfaghari; Ehsan Amiri; Thomas Brox

arXiv:1604.03351·cs.CV·October 23, 2017·40 cites

Orientation-boosted Voxel Nets for 3D Object Recognition

Nima Sedaghat, Mohammadreza Zolfaghari, Ehsan Amiri, Thomas Brox

PDF

Open Access

TL;DR

This paper introduces a multi-task 3D object recognition approach that incorporates object orientation prediction, significantly improving classification accuracy and speed across various 3D datasets.

Contribution

The paper proposes a novel orientation-boosted voxel network that jointly predicts object class and pose, enhancing recognition performance in 3D data.

Findings

01

Achieved state-of-the-art classification results on multiple datasets.

02

Significant improvements in detection precision and speed.

03

Demonstrated the importance of orientation information in 3D recognition.

Abstract

Recent work has shown good recognition results in 3D object recognition using 3D convolutional networks. In this paper, we show that the object orientation plays an important role in 3D recognition. More specifically, we argue that objects induce different features in the network under rotation. Thus, we approach the category-level classification task as a multi-task problem, in which the network is trained to predict the pose of the object in addition to the class label as a parallel task. We show that this yields significant improvements in the classification results. We test our suggested architecture on several datasets representing various 3D data sources: LiDAR data, CAD models, and RGB-D images. We report state-of-the-art results on classification as well as significant improvements in precision and speed over the baseline on 3D detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Neural Network Applications · Human Pose and Action Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings