Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining
Yunwei Ren, Yatin Dandi, Florent Krzakala, Jason D. Lee

TL;DR
This paper proves that deep convolutional networks can efficiently learn hierarchical structures in data, specifically Random Hierarchy Models, through layerwise training, advancing theoretical understanding of deep learning's ability to exploit hierarchy.
Contribution
It provides the first theoretical proof that deep networks trained by gradient methods can learn hierarchical functions like Random Hierarchy Models, which are conjectured to separate deep and shallow networks.
Findings
Deep networks can be trained to learn hierarchical models efficiently.
Layerwise training suffices for hierarchical feature learning under mild conditions.
Theoretical separation between deep and shallow networks is supported.
Abstract
The empirical success of deep learning is often attributed to deep networks' ability to exploit hierarchical structure in data, constructing increasingly complex features across layers. Yet despite substantial progress in deep learning theory, most optimization results sill focus on networks with only two or three layers, leaving the theoretical understanding of hierarchical learning in genuinely deep models limited. This leads to a natural question: can we prove that deep networks, trained by gradient-based methods, can efficiently exploit hierarchical structure? In this work, we consider Random Hierarchy Models -- a hierarchical context-free grammar introduced by arXiv:2307.02129 and conjectured to separate deep and shallow networks. We prove that, under mild conditions, a deep convolutional network can be efficiently trained to learn this function class. Our proof builds on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
