What is Holding Back Convnets for Detection?

Bojan Pepik; Rodrigo Benenson; Tobias Ritschel; Bernt Schiele

arXiv:1508.02844·cs.CV·August 19, 2015·21 cites

What is Holding Back Convnets for Detection?

Bojan Pepik, Rodrigo Benenson, Tobias Ritschel, Bernt Schiele

PDF

Open Access

TL;DR

This paper investigates the limitations of convolutional neural networks in object detection, revealing their invariance issues and proposing data augmentation with renderings to improve performance.

Contribution

It provides an empirical analysis of convnet invariance issues and demonstrates that architectural changes are necessary beyond data augmentation.

Findings

01

ConvNets are not invariant to appearance factors.

02

Increasing training data alone does not mitigate weak points.

03

Data augmentation with renderings improves detection and view-point estimation.

Abstract

Convolutional neural networks have recently shown excellent results in general object detection and many other tasks. Albeit very effective, they involve many user-defined design choices. In this paper we want to better understand these choices by inspecting two key aspects "what did the network learn?", and "what can the network learn?". We exploit new annotations (Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite common belief, our results indicate that existing state-of-the-art convnet architectures are not invariant to various appearance factors. In fact, all considered networks have similar weak points which cannot be mitigated by simply increasing the training data (architectural changes are needed). We show that overall performance can improve when using image renderings for data augmentation. We report the best known results on the Pascal3D+ detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Adversarial Robustness in Machine Learning

MethodsSupport Vector Machine · Max Pooling · Convolution · R-CNN