DeepID-Net: Deformable Deep Convolutional Neural Networks for Object   Detection

Wanli Ouyang; Xiaogang Wang; Xingyu Zeng; Shi Qiu; Ping Luo; Yonglong; Tian; Hongsheng Li; Shuo Yang; Zhe Wang; Chen-Change Loy; Xiaoou Tang

arXiv:1412.5661·cs.CV·June 3, 2015·79 cites

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong, Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, Xiaoou Tang

PDF

Open Access

TL;DR

This paper introduces deformable deep convolutional neural networks with a new def-pooling layer and pre-training strategy, significantly improving object detection accuracy over previous methods like RCNN and GoogLeNet.

Contribution

It presents a novel deformable deep learning architecture with def-pooling and a new pre-training approach, enhancing object detection performance and model diversity.

Findings

01

Improved mean average precision from 31% to 50.3% on ILSVRC2014.

02

Outperformed GoogLeNet by 6.1% in detection accuracy.

03

Provided detailed analysis of components for better understanding of the detection pipeline.

Abstract

In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new deep learning object detection framework has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of model averaging. The proposed approach improves the mean averaged precision obtained by RCNN \cite{girshick2014rich}, which was the state-of-the-art, from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition

Methods1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling