# 3D Object Recognition with Ensemble Learning --- A Study of Point   Cloud-Based Deep Learning Models

**Authors:** Daniel Koguciuk, {\L}ukasz Chechli\'nski, Tarek El-Gaaly

arXiv: 1904.08159 · 2019-05-24

## TL;DR

This paper analyzes ensemble learning methods for 3D point cloud classification and detection, demonstrating improved accuracy on ModelNet40 and KITTI datasets, and explores the effectiveness of different ensemble configurations and architectures.

## Contribution

It provides a comprehensive study of ensemble learning for 3D point cloud models, including architecture combinations, bagging, and application to 3D object detection.

## Key findings

- Ensemble of same architecture models improves accuracy from 92.65% to 93.64%.
-  Combining different architectures yields up to 94.15% accuracy.
-  Ensemble of two different architectures can match the performance of ten same-architecture models.

## Abstract

In this study, we present an analysis of model-based ensemble learning for 3D point-cloud object classification and detection. An ensemble of multiple model instances is known to outperform a single model instance, but there is little study of the topic of ensemble learning for 3D point clouds. First, an ensemble of multiple model instances trained on the same part of the $\textit{ModelNet40}$ dataset was tested for seven deep learning, point cloud-based classification algorithms: $\textit{PointNet}$, $\textit{PointNet++}$, $\textit{SO-Net}$, $\textit{KCNet}$, $\textit{DeepSets}$, $\textit{DGCNN}$, and $\textit{PointCNN}$. Second, the ensemble of different architectures was tested. Results of our experiments show that the tested ensemble learning methods improve over state-of-the-art on the $\textit{ModelNet40}$ dataset, from $92.65\%$ to $93.64\%$ for the ensemble of single architecture instances, $94.03\%$ for two different architectures, and $94.15\%$ for five different architectures. We show that the ensemble of two models with different architectures can be as effective as the ensemble of 10 models with the same architecture. Third, a study on classic bagging i.e. with different subsets used for training multiple model instances) was tested and sources of ensemble accuracy growth were investigated for best-performing architecture, i.e. $\textit{SO-Net}$. We also investigate the ensemble learning of $\textit{Frustum PointNet}$ approach in the task of 3D object detection, increasing the average precision of 3D box detection on the $\textit{KITTI}$ dataset from $63.1\%$ to $66.5\%$ using only three model instances. We measure the inference time of all 3D classification architectures on a $\textit{Nvidia Jetson TX2}$, a common embedded computer for mobile robots, to allude to the use of these models in real-life applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.08159/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1904.08159/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1904.08159/full.md

---
Source: https://tomesphere.com/paper/1904.08159