SNIPER: Efficient Multi-Scale Training

Bharat Singh; Mahyar Najibi; Larry S. Davis

arXiv:1805.09300·cs.CV·December 14, 2018·55 cites

SNIPER: Efficient Multi-Scale Training

Bharat Singh, Mahyar Najibi, Larry S. Davis

PDF

Open Access 4 Repos

TL;DR

SNIPER introduces an efficient multi-scale training algorithm for instance-level visual recognition that processes context regions around ground-truth instances, enabling high-resolution training with reduced computational cost and improved batch normalization.

Contribution

The paper presents SNIPER, a novel multi-scale training method that processes only relevant image regions, allowing for high-resolution training with larger batch sizes and better normalization, challenging the need for high-res images in instance recognition.

Findings

01

Achieves 47.6% mAP on COCO with ResNet-101.

02

Processes 5 images per second on a single GPU.

03

Uses only 30% more pixels than single-scale training.

Abstract

We present SNIPER, an algorithm for performing efficient multi-scale training in instance level visual recognition tasks. Instead of processing every pixel in an image pyramid, SNIPER processes context regions around ground-truth instances (referred to as chips) at the appropriate scale. For background sampling, these context-regions are generated using proposals extracted from a region proposal network trained with a short learning schedule. Hence, the number of chips generated per image during training adaptively changes based on the scene complexity. SNIPER only processes 30% more pixels compared to the commonly used single scale training at 800x1333 pixels on the COCO dataset. But, it also observes samples from extreme resolutions of the image pyramid, like 1400x2000 pixels. As SNIPER operates on resampled low resolution chips (512x512 pixels), it can have a batch size as large as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsAverage Pooling · SNIPER · Weight Decay · Softmax · RoIPool · Faster R-CNN · Region Proposal Network · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution