The staircase property: How hierarchical structure can guide deep   learning

Emmanuel Abbe; Enric Boix-Adsera; Matthew Brennan; Guy Bresler,; Dheeraj Nagaraj

arXiv:2108.10573·cs.LG·November 25, 2021

The staircase property: How hierarchical structure can guide deep learning

Emmanuel Abbe, Enric Boix-Adsera, Matthew Brennan, Guy Bresler,, Dheeraj Nagaraj

PDF

Open Access 1 Video

TL;DR

This paper introduces the staircase property, a structural feature of data enabling deep neural networks to learn hierarchically, and demonstrates its significance through theoretical proofs and experiments with standard architectures.

Contribution

The paper defines the staircase property for functions over the Boolean hypercube and proves that neural networks can learn such functions efficiently using layerwise stochastic coordinate descent.

Findings

01

Staircase functions are learnable in polynomial time by neural networks.

02

Gradient-based algorithms learn high-level features by combining lower-level features.

03

Experiments show staircase functions are learnable by standard ResNet architectures.

Abstract

This paper identifies a structural property of data distributions that enables deep neural networks to learn hierarchically. We define the "staircase" property for functions over the Boolean hypercube, which posits that high-order Fourier coefficients are reachable from lower-order Fourier coefficients along increasing chains. We prove that functions satisfying this property can be learned in polynomial time using layerwise stochastic coordinate descent on regular neural networks -- a class of network architectures and initializations that have homogeneity properties. Our analysis shows that for such staircase functions and neural networks, the gradient-based algorithm learns high-level features by greedily combining lower-level features along the depth of the network. We further back our theoretical results with experiments showing that staircase functions are also learnable by more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The staircase property: How hierarchical structure can guide deep learning· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Batch Normalization · Average Pooling · 1x1 Convolution · Convolution · Residual Block · Bottleneck Residual Block · Global Average Pooling · Kaiming Initialization