Learning by Reconstruction Produces Uninformative Features For Perception
Randall Balestriero, Yann LeCun

TL;DR
This paper reveals that learning by reconstruction often produces features unhelpful for perception tasks, and explores how different noise strategies impact the alignment between reconstruction and perception learning.
Contribution
It identifies the misalignment between reconstruction-based learning and perception, analyzing how feature representations differ and how noise strategies can mitigate this issue.
Findings
Reconstruction focuses on uninformative data subspaces for perception.
Masking noise can improve perception learning, but its effectiveness depends on shape and dataset.
Additive Gaussian noise generally does not benefit perception learning.
Abstract
Input space reconstruction is an attractive representation learning paradigm. Despite interpretability of the reconstruction and generation, we identify a misalignment between learning by reconstruction, and learning for perception. We show that the former allocates a model's capacity towards a subspace of the data explaining the observed variance--a subspace with uninformative features for the latter. For example, the supervised TinyImagenet task with images projected onto the top subspace explaining 90\% of the pixel variance can be solved with 45\% test accuracy. Using the bottom subspace instead, accounting for only 20\% of the pixel variance, reaches 55\% test accuracy. The features for perception being learned last explains the need for long training time, e.g., with Masked Autoencoders. Learning by denoising is a popular strategy to alleviate that misalignment. We prove that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
