Expected Gradients of Maxout Networks and Consequences to Parameter   Initialization

Hanna Tseran; Guido Mont\'ufar

arXiv:2301.06956·stat.ML·May 19, 2023

Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

Hanna Tseran, Guido Mont\'ufar

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper analyzes the gradients of maxout networks to develop initialization strategies that prevent vanishing or exploding gradients, improving training stability and efficiency in deep networks.

Contribution

It provides new bounds on gradients and Jacobian distributions, and proposes initialization methods tailored for maxout networks to enhance training performance.

Findings

01

Initialization strategies improve SGD and Adam training.

02

Derived bounds on the number of linear regions and Jacobian behavior.

03

Enhanced understanding of maxout network properties and training dynamics.

Abstract

We study the gradients of a maxout network with respect to inputs and parameters and obtain bounds for the moments depending on the architecture and the parameter distribution. We observe that the distribution of the input-output Jacobian depends on the input, which complicates a stable parameter initialization. Based on the moments of the gradients, we formulate parameter initialization strategies that avoid vanishing and exploding gradients in wide networks. Experiments with deep fully-connected and convolutional networks show that this strategy improves SGD and Adam training of deep maxout networks. In addition, we obtain refined bounds on the expected number of linear regions, results on the expected curve length distortion, and results on the NTK.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hanna-tseran/maxout_expected_gradients
tfOfficial

Videos

Expected Gradients of Maxout Networks and Consequences to Parameter Initialization· slideslive

Taxonomy

TopicsBrain Tumor Detection and Classification · Machine Learning and ELM · Advanced Neural Network Applications

MethodsStochastic Gradient Descent · Adam · Neural Tangent Kernel · Maxout