Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
Peize Sun, Rufeng Zhang, Yi Jiang, Tao Kong, Chenfeng Xu, Wei Zhan,, Masayoshi Tomizuka, Lei Li, Zehuan Yuan, Changhu Wang, Ping Luo

TL;DR
Sparse R-CNN introduces a novel end-to-end object detection method that uses a fixed set of learnable proposals, eliminating the need for dense candidate generation and post-processing, achieving competitive accuracy and speed.
Contribution
It proposes a fully sparse, learnable proposal-based object detection framework that simplifies design and improves efficiency over traditional dense methods.
Findings
Achieves 45.0 AP on COCO with ResNet-50 FPN.
Operates at 22 fps, comparable to established detectors.
Eliminates non-maximum suppression in detection pipeline.
Abstract
We present Sparse R-CNN, a purely sparse method for object detection in images. Existing works on object detection heavily rely on dense object candidates, such as anchor boxes pre-defined on all grids of image feature map of size . In our method, however, a fixed sparse set of learned object proposals, total length of , are provided to object recognition head to perform classification and location. By eliminating (up to hundreds of thousands) hand-designed object candidates to (e.g. 100) learnable proposals, Sparse R-CNN completely avoids all efforts related to object candidates design and many-to-one label assignment. More importantly, final predictions are directly output without non-maximum suppression post-procedure. Sparse R-CNN demonstrates accuracy, run-time and training convergence performance on par with the well-established detector baselines on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsSparse R-CNN · 1x1 Convolution · Convolution · Feature Pyramid Network
