Spatially Adaptive Computation Time for Residual Networks
Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, Jonathan, Huang, Dmitry Vetrov, Ruslan Salakhutdinov

TL;DR
This paper introduces a Residual Network architecture that adaptively adjusts computation per image region, improving efficiency and aligning well with human visual attention across various vision tasks.
Contribution
It presents a novel spatially adaptive computation method for Residual Networks that is end-to-end trainable and applicable to multiple computer vision tasks.
Findings
Improved computational efficiency on ImageNet and COCO datasets.
Computation time maps correlate with human eye fixations.
Applicable to diverse vision problems without modifications.
Abstract
This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segmentation. We present experimental results showing that this model improves the computational efficiency of Residual Networks on the challenging ImageNet classification and COCO object detection datasets. Additionally, we evaluate the computation time maps on the visual saliency dataset cat2000 and find that they correlate surprisingly well with human eye fixation positions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
