TL;DR
This paper introduces AutoFocus, a method combining scale-normalized image pyramids with a coarse-to-fine approach to efficiently perform object detection, significantly reducing computational costs during training and inference.
Contribution
It proposes SNIPER, an efficient sampling scheme, and AutoFocus, a novel active region prediction method, to accelerate object detection while maintaining accuracy.
Findings
Up to 3x speed-up in training with SNIPER.
2.5-5x inference speed-up using AutoFocus.
Improved object detection efficiency with minimal accuracy loss.
Abstract
We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects' size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyramid increases the computational cost. Hence, we propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects (as object locations are known during training). The resulting approach, referred to as Scale Normalized Image Pyramid with Efficient Resampling or SNIPER, yields up to 3 times speed-up during training. Unfortunately, as object locations are unknown during inference, the entire image pyramid still needs processing. To this end,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSNIP · SNIPER
