Sharp asymptotics on the compression of two-layer neural networks

Mohammad Hossein Amani; Simone Bombari; Marco Mondelli; Rattana; Pukdee; Stefano Rini

arXiv:2205.08199·cs.IT·August 17, 2022

Sharp asymptotics on the compression of two-layer neural networks

Mohammad Hossein Amani, Simone Bombari, Marco Mondelli, Rattana, Pukdee, Stefano Rini

PDF

Open Access

TL;DR

This paper analyzes the asymptotic behavior of compressing two-layer neural networks, showing that under certain conditions, the compression error can be characterized explicitly and the optimal compressed weights follow a specific geometric structure.

Contribution

It provides a theoretical framework for understanding neural network compression in the over-parameterized regime using high-dimensional probability tools.

Findings

01

Error rate of compression depends on input dimension and network size

02

Optimal weights in the mean-field limit are independent of specific target network realization

03

Conjecture that optimal weights form an Equiangular Tight Frame (ETF) for ReLU networks

Abstract

In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes. More precisely, we consider the setting in which the weights of the target network are i.i.d. sub-Gaussian, and we minimize the population L_2 loss between the outputs of the target and of the compressed network, under the assumption of Gaussian inputs. By using tools from high-dimensional probability, we show that this non-convex problem can be simplified when the target network is sufficiently over-parameterized, and provide the error rate of this approximation as a function of the input dimension and N. In this mean-field limit, the simplified objective, as well as the optimal weights of the compressed network, does not depend on the realization of the target network, but only on expected scaling factors. Furthermore, for networks with ReLU activation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Neural Networks and Applications