A case where a spindly two-layer linear network whips any neural network with a fully connected input layer
Manfred K. Warmuth, Wojciech Kot{\l}owski, Ehsan Amid

TL;DR
The paper demonstrates that a simple two-layer linear network can efficiently learn sparse targets from Hadamard matrix data, outperforming fully connected networks that struggle with such tasks when trained with gradient descent.
Contribution
It proves that sparse input layers enable efficient learning of sparse targets with gradient descent, contrasting with the limitations of fully connected networks.
Findings
Two-layer linear network achieves expected square loss of log d / k.
Fully connected networks have expected loss of 1 - k/(d-1).
Sparse input layers are essential for efficient learning of sparse targets.
Abstract
It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a -dimensional Hadamard matrix and the target is one of the features, i.e. very sparse. We essentially prove this conjecture: We show that after receiving a random training set of size , the expected square loss is still . The only requirement needed is that the input layer is fully connected and the initial weight vectors of the input nodes are chosen from a rotation invariant distribution. Surprisingly the same type of problem can be solved drastically more efficient by a simple 2-layer linear neural network in which the inputs are connected to the output node by chains of length 2 (Now the input layer has only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Face and Expression Recognition
