Vanishing Nodes: Another Phenomenon That Makes Training Deep Neural Networks Difficult
Wen-Yu Chang, Tsung-Nan Lin

TL;DR
This paper introduces the concept of vanishing nodes, a phenomenon where hidden nodes in deep neural networks become highly correlated, increasing training difficulty, and proposes a metric to measure this effect.
Contribution
The paper defines vanishing nodes, develops the vanishing node indicator (VNI), and demonstrates its impact on training deep networks, distinct from vanishing gradients.
Findings
Vanishing nodes increase with network depth and are inversely related to width.
When VNI reaches 1, the network's effective nodes reduce to one, hindering learning.
Deeper networks are more prone to training failure due to vanishing nodes.
Abstract
It is well known that the problem of vanishing/exploding gradients is a challenge when training deep networks. In this paper, we describe another phenomenon, called vanishing nodes, that also increases the difficulty of training deep neural networks. As the depth of a neural network increases, the network's hidden nodes have more highly correlated behavior. This results in great similarities between these nodes. The redundancy of hidden nodes thus increases as the network becomes deeper. We call this problem vanishing nodes, and we propose the metric vanishing node indicator (VNI) for quantitatively measuring the degree of vanishing nodes. The VNI can be characterized by the network parameters, which is shown analytically to be proportional to the depth of the network and inversely proportional to the network width. The theoretical results show that the effective number of nodes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
