Action-Driven Object Detection with Top-Down Visual Attentions
Donggeun Yoo, Sunggyun Park, Kyunghyun Paeng, Joon-Young Lee, In So, Kweon

TL;DR
This paper introduces an action-driven, top-down visual attention model called AttentionNet for object detection, which localizes objects through sequential actions without relying on proposals or bounding-box regression, achieving state-of-the-art results.
Contribution
The paper presents a novel top-down detection approach using AttentionNet that localizes objects via sequential actions, eliminating the need for proposal modules or bounding-box regression.
Findings
Achieves state-of-the-art performance on PASCAL VOC and ILSVRC datasets.
Outperforms Faster R-CNN at higher IoU thresholds by +7.1%.
Demonstrates effective holistic scene analysis through sequential action localization.
Abstract
A dominant paradigm for deep learning based object detection relies on a "bottom-up" approach using "passive" scoring of class agnostic proposals. These approaches are efficient but lack of holistic analysis of scene-level context. In this paper, we present an "action-driven" detection mechanism using our "top-down" visual attention model. We localize an object by taking sequential actions that the attention model provides. The attention model conditioned with an image region provides required actions to get closer toward a target object. An action at each time step is weak itself but an ensemble of the sequential actions makes a bounding-box accurately converge to a target object boundary. This attention model we call AttentionNet is composed of a convolutional neural network. During our whole detection procedure, we only utilize the actions from a single AttentionNet without any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
MethodsRegion Proposal Network · Softmax · Convolution · RoIPool · Faster R-CNN
