Deep Kronecker neural networks: A general framework for neural networks   with adaptive activation functions

Ameya D. Jagtap; Yeonjong Shin; Kenji Kawaguchi; George Em Karniadakis

arXiv:2105.09513·cs.LG·October 22, 2021

Deep Kronecker neural networks: A general framework for neural networks with adaptive activation functions

Ameya D. Jagtap, Yeonjong Shin, Kenji Kawaguchi, George Em Karniadakis

PDF

2 Repos

TL;DR

This paper introduces Kronecker neural networks (KNNs), a flexible framework with adaptive activation functions that efficiently construct wide networks, demonstrate faster loss decay, and ensure global convergence, with the novel Rowdy activation enhancing various architectures.

Contribution

The paper presents KNNs as a general framework with adaptive activation functions, including the novel Rowdy activation, and provides theoretical and empirical analysis of their advantages.

Findings

01

KNNs induce faster loss decay than traditional feed-forward networks.

02

The Rowdy activation function eliminates saturation regions and improves performance.

03

KNNs with Rowdy activation perform well across diverse neural network architectures.

Abstract

We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions. KNNs employ the Kronecker product, which provides an efficient way of constructing a very wide network while keeping the number of parameters low. Our theoretical analysis reveals that under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks. This is also empirically verified through a set of computational examples. Furthermore, under certain technical assumptions, we establish global convergence of gradient descent for KNNs. As a specific case, we propose the Rowdy activation function that is designed to get rid of any saturation region by injecting sinusoidal fluctuations, which include trainable parameters. The proposed Rowdy activation function can be employed in any neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.