Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap

arXiv:1803.08375·cs.NE·April 15, 2026·2.5k cites

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap

PDF

1 Repo

TL;DR

This paper clarifies the historical origin of ReLU, compares its performance with Tanh and Sigmoid across various tasks, and confirms the superiority of non-saturating functions like ReLU in deep learning.

Contribution

It corrects the historical record of ReLU's origin and provides a comprehensive empirical comparison of activation functions in deep learning tasks.

Findings

01

ReLU outperforms Sigmoid and Tanh in deep vision and text tasks.

02

Sigmoid fails to converge due to vanishing gradients.

03

Tanh performs best in image reconstruction.

Abstract

The Rectified Linear Unit (ReLU) is a foundational activation function in artficial neural networks. Recent literature frequently misattributes its origin to the 2018 (initial) version of this paper, which exclusively investigated ReLU at the classification layer. This paper formally corrects the citation record by tracing the mathematical lineage of piecewise linear functions from early biological models to their definitive integration into deep learning by Nair & Hinton (2010). Alongside this historical rectification, we present a comprehensive empirical comparison of the ReLU, Hyperbolic Tangent (Tanh), and Logistic (Sigmoid) activation functions across image classification, text classification, and image reconstruction tasks. To ensure statistical robustness, we evaluated these functions using 10 independent randomized trials and assessed significance using the non-parametric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

afagarap/relu-classifier
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.