Tackling Early Sparse Gradients in Softmax Activation Using Leaky   Squared Euclidean Distance

Wei Shen; Rujie Liu

arXiv:1811.10779·cs.CV·November 28, 2018·1 cites

Tackling Early Sparse Gradients in Softmax Activation Using Leaky Squared Euclidean Distance

Wei Shen, Rujie Liu

PDF

Open Access

TL;DR

This paper addresses the problem of early sparse gradients caused by squared Euclidean distance in softmax-based models, proposing leaky squared Euclidean distance to improve gradient flow and classification accuracy.

Contribution

The paper introduces leaky squared Euclidean distance to mitigate early sparse gradients in softmax models, enhancing one-shot learning performance.

Findings

01

Leaky squared Euclidean distance reduces early sparse gradients.

02

Improves one-shot classification accuracy on Omniglot and miniImageNet.

03

Enhances gradient magnitudes during early training stages.

Abstract

Softmax activation is commonly used to output the probability distribution over categories based on certain distance metric. In scenarios like one-shot learning, the distance metric is often chosen to be squared Euclidean distance between the query sample and the category prototype. This practice works well in most time. However, we find that choosing squared Euclidean distance may cause distance explosion leading gradients to be extremely sparse in the early stage of back propagation. We term this phenomena as the early sparse gradients problem. Though it doesn't deteriorate the convergence of the model, it may set up a barrier to further model improvement. To tackle this problem, we propose to use leaky squared Euclidean distance to impose a restriction on distances. In this way, we can avoid distance explosion and increase the magnitude of gradients. Extensive experiments are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Image and Signal Denoising Methods · Advanced Image Processing Techniques