# Why ReLU Units Sometimes Die: Analysis of Single-Unit Error   Backpropagation in Neural Networks

**Authors:** Scott C. Douglas, Jiutian Yu

arXiv: 1812.05981 · 2018-12-17

## TL;DR

This paper investigates why ReLU units sometimes become inactive during training, showing that activation probabilities decrease across layers and analyzing the convergence behavior of dying ReLUs through simulations and a simplified model.

## Contribution

It provides a detailed analysis of dying ReLU units, including their prevalence across layers and convergence dynamics, using both empirical simulations and theoretical modeling.

## Key findings

- Activation probability Pr[y>0] is often less than 0.5 at convergence.
- Dying ReLUs tend to occur regardless of weight initialization.
- Activation probability decreases from input to output layers.

## Abstract

Recently, neural networks in machine learning use rectified linear units (ReLUs) in early processing layers for better performance. Training these structures sometimes results in "dying ReLU units" with near-zero outputs. We first explore this condition via simulation using the CIFAR-10 dataset and variants of two popular convolutive neural network architectures. Our explorations show that the output activation probability Pr[y>0] is generally less than 0.5 at system convergence for layers that do not employ skip connections, and this activation probability tends to decrease as one progresses from input layer to output layer. Employing a simplified model of a single ReLU unit trained by a variant of error backpropagation, we then perform a statistical convergence analysis to explore the model's evolutionary behavior. Our analysis describes the potentially-slower convergence speeds of dying ReLU units, and this issue can occur regardless of how the weights are initialized.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.05981/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1812.05981/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/1812.05981/full.md

---
Source: https://tomesphere.com/paper/1812.05981