Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy   and Speed With adaptive Patch-of-Interest Composition

Shihao Zhang; Weiyao Lin; Ping Lu; Weihua Li; Shuo Deng

arXiv:1708.03795·cs.CV·December 18, 2017·2 cites

Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition

Shihao Zhang, Weiyao Lin, Ping Lu, Weihua Li, Shuo Deng

PDF

Open Access

TL;DR

This paper introduces an adaptive patch-of-interest composition method that enhances object detection accuracy and speed by intelligently selecting and composing image patches, maintaining resolution while reducing input size.

Contribution

The paper proposes a novel adaptive patch composition technique that balances detection accuracy and speed by selecting and combining image patches for efficient object detection.

Findings

01

Improves detection accuracy by maintaining original resolution.

02

Reduces detection time by minimizing input patches.

03

Effective across multiple datasets.

Abstract

Object detection is an important yet challenging task in video understanding & analysis, where one major challenge lies in the proper balance between two contradictive factors: detection accuracy and detection speed. In this paper, we propose a new adaptive patch-of-interest composition approach for boosting both the accuracy and speed for object detection. The proposed approach first extracts patches in a video frame which have the potential to include objects-of-interest. Then, an adaptive composition process is introduced to compose the extracted patches into an optimal number of sub-frames for object detection. With this process, we are able to maintain the resolution of the original frame during object detection (for guaranteeing the accuracy), while minimizing the number of inputs in detection (for boosting the speed). Experimental results on various datasets demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings