Rich feature hierarchies for accurate object detection and semantic   segmentation

Ross Girshick; Jeff Donahue; Trevor Darrell; Jitendra Malik

arXiv:1311.2524·cs.CV·October 23, 2014·526 cites

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

PDF

Open Access 5 Repos 5 Models

TL;DR

This paper introduces R-CNN, a simple yet effective deep learning-based object detection method that significantly outperforms previous approaches by combining region proposals with CNN features and leveraging pre-training.

Contribution

The paper presents R-CNN, a novel approach that applies high-capacity CNNs to region proposals, improving object detection accuracy and demonstrating the benefits of pre-training and fine-tuning.

Findings

01

R-CNN achieves over 30% improvement in mAP on VOC 2012.

02

R-CNN outperforms OverFeat on the ILSVRC2013 dataset.

03

Pre-training plus fine-tuning significantly boosts detection performance.

Abstract

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsSupport Vector Machine · Max Pooling · Convolution · R-CNN