The Z-loss: a shift and scale invariant classification loss belonging to   the Spherical Family

Alexandre de Br\'ebisson; Pascal Vincent

arXiv:1604.08859·cs.LG·May 30, 2016·5 cites

The Z-loss: a shift and scale invariant classification loss belonging to the Spherical Family

Alexandre de Br\'ebisson, Pascal Vincent

PDF

Open Access 1 Repo

TL;DR

The paper introduces the Z-loss, a new classification loss function that is computationally efficient, scale-invariant, and better aligned with task-specific metrics, enabling faster training of large neural networks.

Contribution

It proposes the Z-loss, belonging to the spherical loss family, which addresses scalability and metric alignment issues of the log-softmax in neural network training.

Findings

01

Z-loss outperforms previous spherical loss functions.

02

On the One Billion Word dataset, Z-loss trains 40 times faster than log-softmax.

03

Z-loss achieves better ranking scores, such as top-k, compared to log-softmax.

Abstract

Despite being the standard loss function to train multi-class neural networks, the log-softmax has two potential limitations. First, it involves computations that scale linearly with the number of output classes, which can restrict the size of problems we are able to tackle with current hardware. Second, it remains unclear how close it matches the task loss such as the top-k error rate or other non-differentiable evaluation metrics which we aim to optimize ultimately. In this paper, we introduce an alternative classification loss function, the Z-loss, which is designed to address these two issues. Unlike the log-softmax, it has the desirable property of belonging to the spherical loss family (Vincent et al., 2015), a class of loss functions for which training can be performed very efficiently with a complexity independent of the number of output classes. We show experimentally that it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pascal20100/factored_output_layer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning