StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection

Sanghyun Woo; Soonmin Hwang; In So Kweon

arXiv:1709.05788·cs.CV·September 19, 2017

StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection

Sanghyun Woo, Soonmin Hwang, In So Kweon

PDF

TL;DR

StairNet enhances one-stage object detection by integrating top-down semantic aggregation, significantly improving small object detection while maintaining speed and efficiency, outperforming existing methods on standard benchmarks.

Contribution

Introduces a feature combining module for top-down semantic spreading, unifying multi-scale representations to improve small object detection in SSD-based models.

Findings

01

StairNet outperforms other one-stage detectors on PASCAL VOC datasets.

02

Significant improvement in small object detection accuracy.

03

Maintains fast inference speed with improved accuracy.

Abstract

One-stage object detectors such as SSD or YOLO already have shown promising accuracy with small memory footprint and fast speed. However, it is widely recognized that one-stage detectors have difficulty in detecting small objects while they are competitive with two-stage methods on large objects. In this paper, we investigate how to alleviate this problem starting from the SSD framework. Due to their pyramidal design, the lower layer that is responsible for small objects lacks strong semantics(e.g contextual information). We address this problem by introducing a feature combining module that spreads out the strong semantics in a top-down manner. Our final model StairNet detector unifies the multi-scale representations and semantic distribution effectively. Experiments on PASCAL VOC 2007 and PASCAL VOC 2012 datasets demonstrate that StairNet significantly improves the weakness of SSD and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD