Simple Training Strategies and Model Scaling for Object Detection
Xianzhi Du, Barret Zoph, Wei-Chih Hung, Tsung-Yi Lin

TL;DR
This paper evaluates various training and scaling strategies for object detection, demonstrating significant accuracy improvements and proposing simple model scaling methods to optimize the speed-accuracy trade-off.
Contribution
It systematically benchmarks training techniques, introduces simple scaling strategies for detection models, and compares backbone architectures, highlighting effective methods for improving object detection performance.
Findings
Vanilla detectors improved by 7.7% in accuracy and 30% faster.
Introduced RetinaNet-RS and Cascade RCNN-RS scaling strategies.
ResNet with minor modifications outperforms EfficientNet as backbone.
Abstract
The speed-accuracy Pareto curve of object detection systems have advanced through a combination of better model architectures, training and inference methods. In this paper, we methodically evaluate a variety of these techniques to understand where most of the improvements in modern detection systems come from. We benchmark these improvements on the vanilla ResNet-FPN backbone with RetinaNet and RCNN detectors. The vanilla detectors are improved by 7.7% in accuracy while being 30% faster in speed. We further provide simple scaling strategies to generate family of models that form two Pareto curves, named RetinaNet-RS and Cascade RCNN-RS. These simple rescaled detectors explore the speed-accuracy trade-off between the one-stage RetinaNet detectors and two-stage RCNN detectors. Our largest Cascade RCNN-RS models achieve 52.9% AP with a ResNet152-FPN backbone and 53.6% with a SpineNet143L…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Feature Pyramid Network · Xavier Initialization · ResNet-D · Sigmoid Linear Unit · RetinaNet-RS · Depthwise Convolution · Residual Connection · Global Average Pooling · Sigmoid Activation
