PLU: The Piecewise Linear Unit Activation Function
Andrei Nicolae

TL;DR
The paper introduces the Piecewise Linear Unit (PLU), a new activation function combining properties of tanh and ReLU, which outperforms ReLU and avoids vanishing gradients in deep neural networks.
Contribution
The paper proposes the PLU activation function, a hybrid of tanh and ReLU, demonstrating improved performance and gradient behavior over ReLU.
Findings
PLU outperforms ReLU on various tasks.
PLU avoids vanishing gradient issues.
PLU maintains piecewise linearity for efficient training.
Abstract
Successive linear transforms followed by nonlinear "activation" functions can approximate nonlinear functions to arbitrary precision given sufficient layers. The number of necessary layers is dependent on, in part, by the nature of the activation function. The hyperbolic tangent (tanh) has been a favorable choice as an activation until the networks grew deeper and the vanishing gradients posed a hindrance during training. For this reason the Rectified Linear Unit (ReLU) defined by max(0, x) has become the prevailing activation function in deep neural networks. Unlike the tanh function which is smooth, the ReLU yields networks that are piecewise linear functions with a limited number of facets. This paper presents a new activation function, the Piecewise Linear Unit (PLU) that is a hybrid of tanh and ReLU and shown to outperform the ReLU on a variety of tasks while avoiding the vanishing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems · Control Systems and Identification
