Nonlinear Advantage: Trained Networks Might Not Be As Complex as You   Think

Christian H.X. Ali Mehmeti-G\"opel; Jan Disselhoff

arXiv:2211.17180·cs.LG·June 2, 2023·1 cites

Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think

Christian H.X. Ali Mehmeti-G\"opel, Jan Disselhoff

PDF

Open Access 1 Video

TL;DR

This paper empirically investigates how deep networks can be simplified by linearizing nonlinear units during training, revealing that much of the network's expressivity is unused but aids early training, with nonlinear units forming structured core-networks.

Contribution

It introduces a method to linearize network units during training, analyzes the impact on performance, and proposes a measure called average path length to characterize network depth after linearization.

Findings

01

Linearizing early in training causes significant performance drop.

02

Many nonlinear units can be linearized after training while maintaining high accuracy.

03

Remaining nonlinear units form structured core-networks depending on task difficulty.

Abstract

We perform an empirical study of the behaviour of deep networks when fully linearizing some of its feature channels through a sparsity prior on the overall number of nonlinear units in the network. In experiments on image classification and machine translation tasks, we investigate how much we can simplify the network function towards linearity before performance collapses. First, we observe a significant performance gap when reducing nonlinearity in the network function early on as opposed to late in training, in-line with recent observations on the time-evolution of the data-dependent NTK. Second, we find that after training, we are able to linearize a significant number of nonlinear units while maintaining a high performance, indicating that much of a network's expressivity remains unused but helps gradient descent in early stages of training. To characterize the depth of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think· slideslive

Taxonomy

TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification

MethodsNeural Tangent Kernel