DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong, Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, Xiaoou Tang

TL;DR
This paper introduces deformable deep convolutional neural networks with a new def-pooling layer and pre-training strategy, significantly improving object detection accuracy over previous methods like RCNN and GoogLeNet.
Contribution
It presents a novel deformable deep learning architecture with def-pooling and a new pre-training approach, enhancing object detection performance and model diversity.
Findings
Improved mean average precision from 31% to 50.3% on ILSVRC2014.
Outperformed GoogLeNet by 6.1% in detection accuracy.
Provided detailed analysis of components for better understanding of the detection pipeline.
Abstract
In this paper, we propose deformable deep convolutional neural networks for generic object detection. This new deep learning object detection framework has innovations in multiple aspects. In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty. A new pre-training strategy is proposed to learn feature representations more suitable for the object detection task and with good generalization capability. By changing the net structures, training strategies, adding and removing some key components in the detection pipeline, a set of models with large diversity are obtained, which significantly improves the effectiveness of model averaging. The proposed approach improves the mean averaged precision obtained by RCNN \cite{girshick2014rich}, which was the state-of-the-art, from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
Methods1x1 Convolution · Convolution · Average Pooling · Local Response Normalization · Auxiliary Classifier · Inception Module · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling
