Evaluation of Neural Architectures Trained with Square Loss vs   Cross-Entropy in Classification Tasks

Like Hui; Mikhail Belkin

arXiv:2006.07322·cs.LG·October 26, 2021·53 cites

Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks

Like Hui, Mikhail Belkin

PDF

Open Access 1 Repo 1 Video

TL;DR

This study challenges the common belief that cross-entropy loss outperforms square loss in neural network classification, showing that square loss often yields comparable or better results across various tasks and architectures.

Contribution

The paper provides empirical evidence that square loss can be as effective or better than cross-entropy for training neural classifiers across multiple domains.

Findings

01

Square loss performs comparably or better in NLP and ASR tasks.

02

Cross-entropy has a slight advantage in computer vision tasks.

03

Square loss training is less sensitive to initialization randomness.

Abstract

Modern neural architectures for classification tasks are trained using the cross-entropy loss, which is widely believed to be empirically superior to the square loss. In this work we provide evidence indicating that this belief may not be well-founded. We explore several major neural architectures and a range of standard benchmark datasets for NLP, automatic speech recognition (ASR) and computer vision tasks to show that these architectures, with the same hyper-parameter settings as reported in the literature, perform comparably or better when trained with the square loss, even after equalizing computational resources. Indeed, we observe that the square loss produces better results in the dominant majority of NLP and ASR experiments. Cross-entropy appears to have a slight edge on computer vision tasks. We argue that there is little compelling empirical or theoretical evidence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

EigenPro/EigenPro
pytorch

Videos

Evaluation of Neural Architectures Trained With Square Loss vs Cross-Entropy in Classification Tasks· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Machine Learning and Data Classification