Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK
Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Yingbin Liang, Zhangyang Wang

TL;DR
This paper analyzes one-hidden-layer ReLU networks with large bias initialization in the NTK regime, revealing sparse activation, a new bias-generalized NTK, and improved convergence and generalization bounds.
Contribution
It introduces the bias-generalized NTK, characterizes its properties, and demonstrates faster convergence and sparsity-dependent generalization bounds for networks with large biases.
Findings
Networks with large bias have sparse activation during training.
The bias-generalized NTK differs from the standard NTK and has favorable properties.
Convergence speed is comparable to dense networks, with improved width requirements.
Abstract
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime, where the networks' biases are initialized to some constant rather than zero. We prove that under such initialization, the neural network will have sparse activation throughout the entire training process, which enables fast training procedures via some sophisticated computational methods. With such initialization, we show that the neural networks possess a different limiting kernel which we call \textit{bias-generalized NTK}, and we study various properties of the neural networks with this new kernel. We first characterize the gradient descent dynamics. In particular, we show that the network in this case can achieve as fast convergence as the dense network, as opposed to the previous work suggesting that the sparse networks converge slower. In addition, our result improves the previous required…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsNeural Tangent Kernel
