Cut your Losses with Squentropy

Like Hui; Mikhail Belkin; Stephen Wright

arXiv:2302.03952·cs.LG·February 9, 2023

Cut your Losses with Squentropy

Like Hui, Mikhail Belkin, Stephen Wright

PDF

Open Access 1 Video

TL;DR

This paper introduces the squentropy loss, a new combined loss function for neural classification that improves accuracy, calibration, and stability over traditional cross-entropy and square losses, without additional tuning.

Contribution

The paper proposes the squentropy loss, a novel combination of cross-entropy and square loss, which enhances classification performance and calibration without extra parameter tuning.

Findings

01

Squentropy outperforms cross-entropy and square loss in accuracy.

02

It provides better model calibration than alternatives.

03

It has less variance across different initializations.

Abstract

Nearly all practical neural models for classification are trained using cross-entropy loss. Yet this ubiquitous choice is supported by little theoretical or empirical evidence. Recent work (Hui & Belkin, 2020) suggests that training using the (rescaled) square loss is often superior in terms of the classification accuracy. In this paper we propose the "squentropy" loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes. We provide an extensive set of experiments on multi-class classification problems showing that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy. We also demonstrate that it provides significantly better model calibration than either of these alternative losses and, furthermore, has less variance with respect to the random…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Cut your Losses with Squentropy· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications