Glance and Focus Networks for Dynamic Visual Recognition

Gao Huang; Yulin Wang; Kangchen Lv; Haojun Jiang; Wenhui Huang,; Pengfei Qi; Shiji Song

arXiv:2201.03014·cs.CV·August 5, 2022·6 cites

Glance and Focus Networks for Dynamic Visual Recognition

Gao Huang, Yulin Wang, Kangchen Lv, Haojun Jiang, Wenhui Huang,, Pengfei Qi, Shiji Song

PDF

Open Access 1 Repo

TL;DR

The paper introduces GFNet, a sequential coarse-to-fine visual recognition model that adaptively attends to salient regions, reducing redundant computation and improving efficiency without sacrificing accuracy.

Contribution

GFNet formulates region localization as reinforcement learning, enabling adaptive inference and compatibility with various backbone models for efficient visual recognition.

Findings

01

Reduces MobileNet-V3 latency by 1.3x on iPhone XS Max

02

Achieves comparable accuracy with less computation

03

Demonstrates effectiveness on image and video recognition tasks

Abstract

Spatial redundancy widely exists in visual recognition tasks, i.e., discriminative features in an image or video frame usually correspond to only a subset of pixels, while the remaining regions are irrelevant to the task at hand. Therefore, static models which process all the pixels with an equal amount of computation result in considerable redundancy in terms of time and space consumption. In this paper, we formulate the image recognition problem as a sequential coarse-to-fine feature learning process, mimicking the human visual system. Specifically, the proposed Glance and Focus Network (GFNet) first extracts a quick global representation of the input image at a low resolution scale, and then strategically attends to a series of salient (small) regions to learn finer features. The sequential process naturally facilitates adaptive inference at test time, as it can be terminated once…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

blackfeather-wang/GFNet-Pytorch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques