TL;DR
This paper proposes a new architecture and techniques for multi-scale object detection that improve feature alignment and detection accuracy, achieving state-of-the-art results on Pascal VOC and MS COCO datasets.
Contribution
It introduces an independent hierarchy pyramid architecture, soft nearest neighbor interpolation, and adaptive feature selection methods to enhance multi-head detectors.
Findings
Achieved state-of-the-art detection performance on Pascal VOC.
Improved feature alignment with the secondary features alignment solution.
Enhanced lightweight convolutional techniques for real-time detection.
Abstract
Multi-head detectors typically employ a features-fused-pyramid-neck for multi-scale detection and are widely adopted in the industry. However, this approach faces feature misalignment when representations from different hierarchical levels of the feature pyramid are forcibly fused point-to-point. To address this issue, we designed an independent hierarchy pyramid (IHP) architecture to evaluate the effectiveness of the features-unfused-pyramid-neck for multi-head detectors. Subsequently, we introduced soft nearest neighbor interpolation (SNI) with a weight downscaling factor to mitigate the impact of feature fusion at different hierarchies while preserving key textures. Furthermore, we present a features adaptive selection method for down sampling in extended spatial windows (ESD) to retain spatial features and enhance lightweight convolutional techniques (GSConvE). These advancements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
