Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining

Yunwei Ren; Yatin Dandi; Florent Krzakala; Jason D. Lee

arXiv:2601.19756·cs.LG·January 28, 2026

Provable Learning of Random Hierarchy Models and Hierarchical Shallow-to-Deep Chaining

Yunwei Ren, Yatin Dandi, Florent Krzakala, Jason D. Lee

PDF

Open Access

TL;DR

This paper proves that deep convolutional networks can efficiently learn hierarchical structures in data, specifically Random Hierarchy Models, through layerwise training, advancing theoretical understanding of deep learning's ability to exploit hierarchy.

Contribution

It provides the first theoretical proof that deep networks trained by gradient methods can learn hierarchical functions like Random Hierarchy Models, which are conjectured to separate deep and shallow networks.

Findings

01

Deep networks can be trained to learn hierarchical models efficiently.

02

Layerwise training suffices for hierarchical feature learning under mild conditions.

03

Theoretical separation between deep and shallow networks is supported.

Abstract

The empirical success of deep learning is often attributed to deep networks' ability to exploit hierarchical structure in data, constructing increasingly complex features across layers. Yet despite substantial progress in deep learning theory, most optimization results sill focus on networks with only two or three layers, leaving the theoretical understanding of hierarchical learning in genuinely deep models limited. This leads to a natural question: can we prove that deep networks, trained by gradient-based methods, can efficiently exploit hierarchical structure? In this work, we consider Random Hierarchy Models -- a hierarchical context-free grammar introduced by arXiv:2307.02129 and conjectured to separate deep and shallow networks. We prove that, under mild conditions, a deep convolutional network can be efficiently trained to learn this function class. Our proof builds on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis