# Mitigating the Vanishing Gradient Problem Using a Pseudo-Normalizing Method

**Authors:** Yun Bu, Wenbo Jiang, Gang Lu, Qiang Zhang

PMC · DOI: 10.3390/e28010057 · Entropy · 2025-12-31

## TL;DR

This paper introduces a pseudo-normalizing method to address vanishing gradients in deep neural networks, improving training stability and performance.

## Contribution

The novel pseudo-normalization technique amplifies gradients periodically to avoid vanishing and exploding gradients.

## Key findings

- The method successfully improved training of deep networks with hyperbolic tangent activation.
- Networks using this method rely more on image contour information for classification.
- The approach can complement existing deep learning algorithms.

## Abstract

When training a neural network, the choice of activation function can greatly impact its performance. A function with a larger derivative may cause the coefficients of the latter layers to deviate further from the calculated direction, making deep learning more difficult to train. However, an activation function with a derivative amplitude of less than one can result in the problem of a vanishing gradient. To overcome this drawback, we propose the application of pseudo-normalization to enlarge some gradients by dividing them by the root mean square. This amplification is performed every few layers to ensure that the amplitudes are larger than one, thus avoiding the condition of vanishing gradient and preventing gradient explosion. We successfully applied this approach to several deep learning networks with hyperbolic tangent activation for image classifications. To gain a deeper understanding of the algorithm, we employed interpretability techniques to examine the network’s prediction outcomes. We discovered that, in contrast to popular networks that learn picture characteristics, the networks primarily employ the contour information of images for categorization. This suggests that our technique can be utilized in addition to other widely used algorithms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12839799/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12839799/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12839799/full.md

---
Source: https://tomesphere.com/paper/PMC12839799