HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection
Tao Kong, Anbang Yao, Yurong Chen, Fuchun Sun

TL;DR
HyperNet introduces a hierarchical deep network that combines multi-level features for improved region proposal generation and object detection, achieving high accuracy and efficiency on standard benchmarks.
Contribution
The paper proposes HyperNet, a novel deep hierarchical network that effectively aggregates features for joint proposal generation and detection, outperforming existing methods.
Findings
Achieves state-of-the-art detection accuracy on PASCAL VOC 2007 and 2012.
Generates high recall with only 100 proposals per image.
Operates at 5 fps on a GPU, indicating potential for real-time applications.
Abstract
Almost all of the current top-performing object detection networks employ region proposals to guide the search for object instances. State-of-the-art region proposal methods usually need several thousand proposals to get high recall, thus hurting the detection efficiency. Although the latest Region Proposal Network method gets promising detection accuracy with several hundred proposals, it still struggles in small-size object detection and precise localization (e.g., large IoU thresholds), mainly due to the coarseness of its feature maps. In this paper, we present a deep hierarchical network, namely HyperNet, for handling region proposal generation and object detection jointly. Our HyperNet is primarily based on an elaborately designed Hyper Feature which aggregates hierarchical feature maps first and then compresses them into a uniform space. The Hyper Features well incorporate deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
