Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep   Learning

Elliott Gordon-Rodriguez; Gabriel Loaiza-Ganem; Geoff Pleiss; John P.; Cunningham

arXiv:2011.05231·stat.ML·November 11, 2020·55 cites

Uses and Abuses of the Cross-Entropy Loss: Case Studies in Modern Deep Learning

Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Geoff Pleiss, John P., Cunningham

PDF

Open Access 2 Repos

TL;DR

This paper critically examines the widespread use of cross-entropy loss in deep learning, especially when data is not strictly categorical, proposing probabilistically sound alternatives and demonstrating their advantages through experiments.

Contribution

It introduces probabilistically-inspired alternatives to cross-entropy loss for non-categorical data, grounded in the continuous-categorical distribution, and evaluates their effectiveness.

Findings

01

Probabilistically-inspired models can outperform traditional cross-entropy approaches.

02

Proper probabilistic treatment reduces failure modes in deep learning models.

03

Experimental results highlight the importance of theoretical rigor in loss function design.

Abstract

Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical cross-entropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actor-mimic reinforcement learning, amongst others. Drawing on the recently discovered continuous-categorical distribution, we propose probabilistically-inspired alternatives to these models, providing an approach that is more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Neural Networks and Applications

MethodsLabel Smoothing