TL;DR
This paper clarifies the historical origin of ReLU, compares its performance with Tanh and Sigmoid across various tasks, and confirms the superiority of non-saturating functions like ReLU in deep learning.
Contribution
It corrects the historical record of ReLU's origin and provides a comprehensive empirical comparison of activation functions in deep learning tasks.
Findings
ReLU outperforms Sigmoid and Tanh in deep vision and text tasks.
Sigmoid fails to converge due to vanishing gradients.
Tanh performs best in image reconstruction.
Abstract
The Rectified Linear Unit (ReLU) is a foundational activation function in artficial neural networks. Recent literature frequently misattributes its origin to the 2018 (initial) version of this paper, which exclusively investigated ReLU at the classification layer. This paper formally corrects the citation record by tracing the mathematical lineage of piecewise linear functions from early biological models to their definitive integration into deep learning by Nair & Hinton (2010). Alongside this historical rectification, we present a comprehensive empirical comparison of the ReLU, Hyperbolic Tangent (Tanh), and Logistic (Sigmoid) activation functions across image classification, text classification, and image reconstruction tasks. To ensure statistical robustness, we evaluated these functions using 10 independent randomized trials and assessed significance using the non-parametric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
