Noisy Softmax: Improving the Generalization Ability of DCNN via   Postponing the Early Softmax Saturation

Binghui Chen; Weihong Deng; Junping Du

arXiv:1708.03769·cs.CV·August 15, 2017·30 cites

Noisy Softmax: Improving the Generalization Ability of DCNN via Postponing the Early Softmax Saturation

Binghui Chen, Weihong Deng, Junping Du

PDF

Open Access

TL;DR

This paper introduces Noisy Softmax, a method that injects annealed noise into the softmax function during training to delay saturation, enhance exploration, and improve CNN generalization.

Contribution

It proposes a novel noise injection technique in softmax to mitigate early saturation, promoting better exploration and generalization in CNN training.

Findings

01

Improves CNN generalization on benchmark datasets.

02

Achieves state-of-the-art or competitive results.

03

Enhances exploration during training by delaying softmax saturation.

Abstract

Over the past few years, softmax and SGD have become a commonly used component and the default training strategy in CNN frameworks, respectively. However, when optimizing CNNs with SGD, the saturation behavior behind softmax always gives us an illusion of training well and then is omitted. In this paper, we first emphasize that the early saturation behavior of softmax will impede the exploration of SGD, which sometimes is a reason for model converging at a bad local-minima, then propose Noisy Softmax to mitigating this early saturation issue by injecting annealed noise in softmax during each iteration. This operation based on noise injection aims at postponing the early saturation and further bringing continuous gradients propagation so as to significantly encourage SGD solver to be more exploratory and help to find a better local-minima. This paper empirically verifies the superiority…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition

MethodsSoftmax · Stochastic Gradient Descent