
TL;DR
This paper presents a theoretical framework modeling the optimal depth of neural networks as an optimal stopping problem, providing insights into early exiting strategies that improve efficiency without sacrificing accuracy.
Contribution
It introduces a formal optimal stopping framework for neural network depth, proves finiteness of optimal depth under certain conditions, and proposes a regularizer to encourage early exiting.
Findings
The regularizer effectively induces early exiting behavior.
Empirical results show significant computational savings on ImageNet.
The framework extends to Transformers and continuous-depth models.
Abstract
Determining the optimal depth of a neural network is a fundamental yet challenging problem, typically resolved through resource-intensive experimentation. This paper introduces a formal theoretical framework to address this question by recasting the forward pass of a deep network, specifically a Residual Network (ResNet), as an optimal stopping problem. We model the layer-by-layer evolution of hidden representations as a sequential decision process where, at each layer, a choice is made between halting computation to make a prediction or continuing to a deeper layer for a potentially more refined representation. This formulation captures the intrinsic trade-off between accuracy and computational cost. Our primary theoretical contribution is a proof that, under a plausible condition of diminishing returns on the residual functions, the expected optimal stopping depth is provably finite,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Advanced Neural Network Applications
MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer
