Memory capacity of two layer neural networks with smooth activations
Liam Madden, Christos Thrampoulidis

TL;DR
This paper establishes a lower bound on the memory capacity of two-layer neural networks with smooth, real analytic activations, covering most practical activation functions and extending previous results.
Contribution
It provides a nearly tight lower bound on the memory capacity for a broad class of activations, using novel rank analysis of the network's Jacobian.
Findings
Lower bound of approximately half the total parameters for memory capacity.
Coverage of almost all practical activations including sigmoid, Heaviside, and ReLU.
Extension of classical linear algebra results to neural network Jacobian analysis.
Abstract
Determining the memory capacity of two layer neural networks with hidden neurons and input dimension (i.e., total trainable parameters), which refers to the largest size of general data the network can memorize, is a fundamental machine learning question. For activations that are real analytic at a point and, if restricting to a polynomial there, have sufficiently high degree, we establish a lower bound of and optimality up to a factor of approximately . All practical activations, such as sigmoids, Heaviside, and the rectified linear unit (ReLU), are real analytic at a point. Furthermore, the degree condition is mild, requiring, for example, that if the activation is . Analogous prior results were limited to Heaviside and ReLU activations -- our result covers almost everything else. In order to analyze general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Ferroelectric and Negative Capacitance Devices · Quantum Computing Algorithms and Architecture
