R-FCN: Object Detection via Region-based Fully Convolutional Networks

Jifeng Dai; Yi Li; Kaiming He; Jian Sun

arXiv:1605.06409·cs.CV·December 12, 2023·3.4k cites

R-FCN: Object Detection via Region-based Fully Convolutional Networks

Jifeng Dai, Yi Li, Kaiming He, Jian Sun

PDF

Open Access 5 Repos

TL;DR

R-FCN introduces a fully convolutional, region-based object detection method that shares computation across the entire image, achieving high accuracy and significantly faster speeds than previous region-based detectors.

Contribution

The paper proposes position-sensitive score maps and a fully convolutional architecture to improve efficiency and accuracy in object detection, compatible with ResNet backbones.

Findings

01

Achieves 83.6% mAP on PASCAL VOC 2007 with ResNet-101.

02

Runs at 170ms per image, 2.5-20x faster than Faster R-CNN.

03

Demonstrates competitive accuracy with improved speed.

Abstract

We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Our method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets), for object detection. We show competitive results on the PASCAL VOC datasets (e.g., 83.6% mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Industrial Vision Systems and Defect Detection

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Average Pooling · Global Average Pooling · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Bottleneck Residual Block · Max Pooling · Kaiming Initialization · Residual Connection