Memory capacity of two layer neural networks with smooth activations

Liam Madden; Christos Thrampoulidis

arXiv:2308.02001·cs.LG·July 25, 2024

Memory capacity of two layer neural networks with smooth activations

Liam Madden, Christos Thrampoulidis

PDF

Open Access

TL;DR

This paper establishes a lower bound on the memory capacity of two-layer neural networks with smooth, real analytic activations, covering most practical activation functions and extending previous results.

Contribution

It provides a nearly tight lower bound on the memory capacity for a broad class of activations, using novel rank analysis of the network's Jacobian.

Findings

01

Lower bound of approximately half the total parameters for memory capacity.

02

Coverage of almost all practical activations including sigmoid, Heaviside, and ReLU.

03

Extension of classical linear algebra results to neural network Jacobian analysis.

Abstract

Determining the memory capacity of two layer neural networks with $m$ hidden neurons and input dimension $d$ (i.e., $m d + 2 m$ total trainable parameters), which refers to the largest size of general data the network can memorize, is a fundamental machine learning question. For activations that are real analytic at a point and, if restricting to a polynomial there, have sufficiently high degree, we establish a lower bound of $⌊ m d /2 ⌋$ and optimality up to a factor of approximately $2$ . All practical activations, such as sigmoids, Heaviside, and the rectified linear unit (ReLU), are real analytic at a point. Furthermore, the degree condition is mild, requiring, for example, that $(d - 1 k + d - 1) \geq n$ if the activation is $x^{k}$ . Analogous prior results were limited to Heaviside and ReLU activations -- our result covers almost everything else. In order to analyze general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Ferroelectric and Negative Capacitance Devices · Quantum Computing Algorithms and Architecture