On the Regularization of Autoencoders
Harald Steck, Dario Garcia Garcia

TL;DR
This paper investigates how unsupervised autoencoders inherently regularize their models, showing they cannot outperform linear autoencoders of the same size, and provides a closed-form approximation for a constrained low-rank autoencoder model.
Contribution
It extends recent linear model results to nonlinear and constrained autoencoders, revealing inherent regularization effects and deriving an approximation for the EDLAE model's optimal solution.
Findings
Unsupervised autoencoders induce strong regularization, limiting their capacity.
Deep nonlinear autoencoders cannot outperform linear autoencoders with the same last hidden layer size.
The derived approximation accurately predicts the EDLAE model's optimal solution across datasets.
Abstract
While much work has been devoted to understanding the implicit (and explicit) regularization of deep nonlinear networks in the supervised setting, this paper focuses on unsupervised learning, i.e., autoencoders are trained with the objective of reproducing the output from the input. We extend recent results [Jin et al. 2021] on unconstrained linear models and apply them to (1) nonlinear autoencoders and (2) constrained linear autoencoders, obtaining the following two results: first, we show that the unsupervised setting by itself induces strong additional regularization, i.e., a severe reduction in the model-capacity of the learned autoencoder: we derive that a deep nonlinear autoencoder cannot fit the training data more accurately than a linear autoencoder does if both models have the same dimensionality in their last hidden layer (and under a few additional assumptions). Our second…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications
