TL;DR
This paper introduces Kronecker neural networks (KNNs), a flexible framework with adaptive activation functions that efficiently construct wide networks, demonstrate faster loss decay, and ensure global convergence, with the novel Rowdy activation enhancing various architectures.
Contribution
The paper presents KNNs as a general framework with adaptive activation functions, including the novel Rowdy activation, and provides theoretical and empirical analysis of their advantages.
Findings
KNNs induce faster loss decay than traditional feed-forward networks.
The Rowdy activation function eliminates saturation regions and improves performance.
KNNs with Rowdy activation perform well across diverse neural network architectures.
Abstract
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of constructing a very wide network while keeping the number of parameters low. Our theoretical analysis reveals that under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks. This is also empirically verified through a set of computational examples. Furthermore, under certain technical assumptions, we establish global convergence of gradient descent for KNNs. As a specific case, we propose the Rowdy activation function that is designed to get rid of any saturation region by injecting sinusoidal fluctuations, which include trainable parameters. The proposed Rowdy activation function can be employed in any neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
