Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

TL;DR
This paper introduces PReLU, a parametric rectifier activation function, and a robust initialization method, enabling the training of very deep neural networks that surpass human-level performance on ImageNet classification.
Contribution
The paper presents PReLU and a new initialization technique, allowing training of deeper networks and achieving state-of-the-art results surpassing human performance.
Findings
PReLU improves model fitting with minimal extra cost.
Deep rectified networks can be trained from scratch.
Achieved 4.94% top-5 error on ImageNet, surpassing human performance.
Abstract
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection
MethodsConvolution · Spatial Pyramid Pooling · Dropout · Dense Connections · Max Pooling · Softmax · Step Decay · SGD with Momentum · Weight Decay · Color Jitter
