Are Efficient Deep Representations Learnable?

Maxwell Nye; Andrew Saxe

arXiv:1807.06399·cs.LG·July 18, 2018·1 cites

Are Efficient Deep Representations Learnable?

Maxwell Nye, Andrew Saxe

PDF

Open Access

TL;DR

This paper investigates whether standard deep learning methods can learn efficient deep representations like the parity function and Fourier transform, finding significant limitations in learning these functions with typical training approaches.

Contribution

The study demonstrates that current gradient-based deep learning techniques struggle to learn certain efficient representations unless initialized very close to the solutions, highlighting gaps in learnability.

Findings

01

Deep networks fail to learn the parity function with standard training.

02

Deep linear networks do not learn the Fourier transform without near-exact initialization.

03

Not all efficient representations are learnable by current deep learning methods.

Abstract

Many theories of deep learning have shown that a deep network can require dramatically fewer resources to represent a given function compared to a shallow network. But a question remains: can these efficient representations be learned using current deep learning techniques? In this work, we test whether standard deep learning methods can in fact find the efficient representations posited by several theories of deep representation. Specifically, we train deep neural networks to learn two simple functions with known efficient solutions: the parity function and the fast Fourier transform. We find that using gradient-based optimization, a deep network does not learn the parity function, unless initialized very close to a hand-coded exact solution. We also find that a deep linear neural network does not learn the fast Fourier transform, even in the best-case scenario of infinite training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications