Fundamental Limits of Two-layer Autoencoders, and Achieving Them with Gradient Methods
Alexander Shevchenko, Kevin K\"ogler, Hamed Hassani, Marco Mondelli

TL;DR
This paper investigates the fundamental limits of two-layer non-linear autoencoders in data compression, characterizes the solutions found by gradient methods, and demonstrates the universality of these results across datasets.
Contribution
It provides a theoretical analysis of the optimal autoencoder solutions and their training dynamics in the proportional regime, extending understanding beyond linear models.
Findings
Characterizes minimizers of population risk for non-linear autoencoders.
Shows gradient methods achieve these minimizers.
Establishes fundamental limits for Gaussian source compression with sign activation.
Abstract
Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Machine Learning and Data Classification
