Cross-Entropy Loss Functions: Theoretical Analysis and Applications
Anqi Mao, Mehryar Mohri, Yutao Zhong

TL;DR
This paper provides a comprehensive theoretical analysis of cross-entropy and related loss functions, introducing new bounds and loss variants, and demonstrates their effectiveness in adversarial robustness and accuracy through empirical evaluation.
Contribution
It introduces the first H-consistency bounds for comp-sum losses, proposes smooth adversarial comp-sum losses, and develops new adversarial robustness algorithms with empirical validation.
Findings
H-consistency bounds are tight and non-asymptotic.
Smooth adversarial comp-sum losses improve adversarial robustness.
Proposed algorithms outperform state-of-the-art in experiments.
Abstract
Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of loss functions, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. We give the first -consistency bounds for these loss functions. These are non-asymptotic guarantees that upper bound the zero-one loss estimation error in terms of the estimation error of a surrogate loss, for the specific hypothesis set used. We further show that our bounds are tight. These bounds depend on quantities called minimizability gaps. To make them more explicit, we give a specific analysis of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Integrated Circuits and Semiconductor Failure Analysis
MethodsSoftmax
