Simple Training Strategies and Model Scaling for Object Detection

Xianzhi Du; Barret Zoph; Wei-Chih Hung; Tsung-Yi Lin

arXiv:2107.00057·cs.CV·July 2, 2021·27 cites

Simple Training Strategies and Model Scaling for Object Detection

Xianzhi Du, Barret Zoph, Wei-Chih Hung, Tsung-Yi Lin

PDF

Open Access 1 Repo

TL;DR

This paper evaluates various training and scaling strategies for object detection, demonstrating significant accuracy improvements and proposing simple model scaling methods to optimize the speed-accuracy trade-off.

Contribution

It systematically benchmarks training techniques, introduces simple scaling strategies for detection models, and compares backbone architectures, highlighting effective methods for improving object detection performance.

Findings

01

Vanilla detectors improved by 7.7% in accuracy and 30% faster.

02

Introduced RetinaNet-RS and Cascade RCNN-RS scaling strategies.

03

ResNet with minor modifications outperforms EfficientNet as backbone.

Abstract

The speed-accuracy Pareto curve of object detection systems have advanced through a combination of better model architectures, training and inference methods. In this paper, we methodically evaluate a variety of these techniques to understand where most of the improvements in modern detection systems come from. We benchmark these improvements on the vanilla ResNet-FPN backbone with RetinaNet and RCNN detectors. The vanilla detectors are improved by 7.7% in accuracy while being 30% faster in speed. We further provide simple scaling strategies to generate family of models that form two Pareto curves, named RetinaNet-RS and Cascade RCNN-RS. These simple rescaled detectors explore the speed-accuracy trade-off between the one-stage RetinaNet detectors and two-stage RCNN detectors. Our largest Cascade RCNN-RS models achieve 52.9% AP with a ResNet152-FPN backbone and 53.6% with a SpineNet143L…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tensorflow/tpu
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Feature Pyramid Network · Xavier Initialization · ResNet-D · Sigmoid Linear Unit · RetinaNet-RS · Depthwise Convolution · Residual Connection · Global Average Pooling · Sigmoid Activation