Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection
Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang

TL;DR
This paper introduces a novel zoom-out-and-in network with a map attention decision unit to improve object proposal generation and detection by effectively leveraging multi-scale features and adaptive feature map activation.
Contribution
It proposes a new network architecture with a map attention decision unit that adaptively activates feature channels based on input context, enhancing object proposal and detection performance.
Findings
Outperforms state-of-the-art methods on PASCAL VOC 2007, ImageNet DET, MS COCO datasets.
Achieves higher average recall (AR) for region proposals.
Improves average precision (AP) for object detection.
Abstract
In this paper, we propose a zoom-out-and-in network for generating object proposals. A key observation is that it is difficult to classify anchors of different sizes with the same set of features. Anchors of different sizes should be placed accordingly based on different depth within a network: smaller boxes on high-resolution layers with a smaller stride while larger boxes on low-resolution counterparts with a larger stride. Inspired by the conv/deconv structure, we fully leverage the low-level local details and high-level regional semantics from two feature map streams, which are complimentary to each other, to identify the objectness in an image. A map attention decision (MAD) unit is further proposed to aggressively search for neuron activations among two streams and attend the most contributive ones on the feature learning of the final loss. The unit serves as a decisionmaker to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
