Neural Networks with Sparse Activation Induced by Large Bias: Tighter   Analysis with Bias-Generalized NTK

Hongru Yang; Ziyu Jiang; Ruizhe Zhang; Yingbin Liang; Zhangyang Wang

arXiv:2301.00327·cs.LG·October 31, 2024

Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK

Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Yingbin Liang, Zhangyang Wang

PDF

Open Access

TL;DR

This paper analyzes one-hidden-layer ReLU networks with large bias initialization in the NTK regime, revealing sparse activation, a new bias-generalized NTK, and improved convergence and generalization bounds.

Contribution

It introduces the bias-generalized NTK, characterizes its properties, and demonstrates faster convergence and sparsity-dependent generalization bounds for networks with large biases.

Findings

01

Networks with large bias have sparse activation during training.

02

The bias-generalized NTK differs from the standard NTK and has favorable properties.

03

Convergence speed is comparable to dense networks, with improved width requirements.

Abstract

We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime, where the networks' biases are initialized to some constant rather than zero. We prove that under such initialization, the neural network will have sparse activation throughout the entire training process, which enables fast training procedures via some sophisticated computational methods. With such initialization, we show that the neural networks possess a different limiting kernel which we call \textit{bias-generalized NTK}, and we study various properties of the neural networks with this new kernel. We first characterize the gradient descent dynamics. In particular, we show that the network in this case can achieve as fast convergence as the dense network, as opposed to the previous work suggesting that the sparse networks converge slower. In addition, our result improves the previous required…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Stochastic Gradient Optimization Techniques

MethodsNeural Tangent Kernel