When Do Early-Exit Networks Generalize? A PAC-Bayesian Theory of Adaptive Depth

Dongxin Guo; Jikun Wu; Siu Ming Yiu

arXiv:2604.15764·cs.LG·April 20, 2026

When Do Early-Exit Networks Generalize? A PAC-Bayesian Theory of Adaptive Depth

Dongxin Guo, Jikun Wu, Siu Ming Yiu

PDF

TL;DR

This paper develops a PAC-Bayesian theoretical framework for early-exit neural networks, providing new generalization bounds based on entropy and expected depth, and demonstrating their practical advantages through extensive experiments.

Contribution

It introduces the first entropy-based generalization bounds for adaptive-depth networks, with explicit constants and conditions for outperforming fixed-depth models.

Findings

01

Generalization bounds depend on exit-depth entropy and expected depth.

02

Experiments show bounds are tight and guide threshold selection effectively.

03

Adaptive-depth networks can provably outperform fixed-depth counterparts under certain conditions.

Abstract

Early-exit neural networks enable adaptive computation by allowing confident predictions to exit at intermediate layers, achieving 2-8 $\times$ inference speedup. Despite widespread deployment, their generalization properties lack theoretical understanding -- a gap explicitly identified in recent surveys. This paper establishes a unified PAC-Bayesian framework for adaptive-depth networks. (1) Novel Entropy-Based Bounds: We prove the first generalization bounds depending on exit-depth entropy $H (D)$ and expected depth $E [D]$ rather than maximum depth $K$ , with sample complexity $O ((E [D] \cdot d + H (D)) / ϵ^{2})$ . (2) Explicit Constructive Constants: Our analysis yields the leading coefficient $2 ln 2 \approx 1.177$ with complete derivation. (3) Provable Early-Exit Advantages: We establish sufficient conditions under which adaptive-depth networks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.