DL101 Neural Network Outputs and Loss Functions
Fernando Berzal

TL;DR
This paper explores the relationship between neural network output layer activation functions and their corresponding loss functions, providing statistical justifications and practical considerations for selecting appropriate combinations in deep learning models.
Contribution
It offers a detailed analysis connecting common activation and loss functions to statistical principles like MLE and GLMs, clarifying their appropriate use cases.
Findings
Activation functions are linked to specific probability distributions.
Loss functions correspond to maximum likelihood estimation principles.
Practical scenarios like constrained outputs are discussed.
Abstract
The loss function used to train a neural network is strongly connected to its output layer from a statistical point of view. This technical report analyzes common activation functions for a neural network output layer, like linear, sigmoid, ReLU, and softmax, detailing their mathematical properties and their appropriate use cases. A strong statistical justification exists for the selection of the suitable loss function for training a deep learning model. This report connects common loss functions such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and various Cross-Entropy losses to the statistical principle of Maximum Likelihood Estimation (MLE). Choosing a specific loss function is equivalent to assuming a specific probability distribution for the model output, highlighting the link between these functions and the Generalized Linear Models (GLMs) that underlie network output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Machine Learning and ELM
