Empirical Loss Landscape Analysis of Neural Network Activation Functions

Anna Sergeevna Bosman; Andries Engelbrecht; Marde Helbig

arXiv:2306.16090·cs.LG·June 29, 2023

Empirical Loss Landscape Analysis of Neural Network Activation Functions

Anna Sergeevna Bosman, Andries Engelbrecht, Marde Helbig

PDF

1 Repo

TL;DR

This paper empirically analyzes how different activation functions affect the loss landscape of neural networks, revealing their influence on convexity, flatness, and generalization performance.

Contribution

It provides a comparative empirical study of loss landscapes for hyperbolic tangent, ReLU, and ELU activation functions, highlighting their distinct properties.

Findings

01

ReLU yields the most convex loss landscape.

02

ELU results in the least flat loss landscape and better generalization.

03

Wide and narrow valleys are present in all activation functions, with narrow valleys linked to saturation.

Abstract

Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

annabosman/fla-in-tf
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.