Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning
Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Geoff Pleiss, John P., Cunningham

TL;DR
This paper critically examines the widespread use of cross-entropy loss in deep learning, especially when data is not strictly categorical, proposing probabilistically sound alternatives and demonstrating their advantages through experiments.
Contribution
It introduces probabilistically-inspired alternatives to cross-entropy loss for non-categorical data, grounded in the continuous-categorical distribution, and evaluates their effectiveness.
Findings
Probabilistically-inspired models can outperform traditional cross-entropy approaches.
Proper probabilistic treatment reduces failure modes in deep learning models.
Experimental results highlight the importance of theoretical rigor in loss function design.
Abstract
Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Neural Networks and Applications
MethodsLabel Smoothing
