Delving Deep into Rectifiers: Surpassing Human-Level Performance on   ImageNet Classification

Kaiming He; Xiangyu Zhang; Shaoqing Ren; Jian Sun

arXiv:1502.01852·cs.CV·February 9, 2015·1.0k cites

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

PDF

Open Access 5 Repos 3 Models

TL;DR

This paper introduces PReLU, a parametric rectifier activation function, and a robust initialization method, enabling the training of very deep neural networks that surpass human-level performance on ImageNet classification.

Contribution

The paper presents PReLU and a new initialization technique, allowing training of deeper networks and achieving state-of-the-art results surpassing human performance.

Findings

01

PReLU improves model fitting with minimal extra cost.

02

Deep rectified networks can be trained from scratch.

03

Achieved 4.94% top-5 error on ImageNet, surpassing human performance.

Abstract

Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection

MethodsConvolution · Spatial Pyramid Pooling · Dropout · Dense Connections · Max Pooling · Softmax · Step Decay · SGD with Momentum · Weight Decay · Color Jitter