The Effective Depth Paradox: Evaluating the Relationship between Architectural Topology and Trainability in Deep CNNs
Manfred M. Fischer, Joshua Pitts

TL;DR
This study explores how the topology of deep CNN architectures influences trainability and performance, emphasizing effective depth over nominal depth as a key factor.
Contribution
It introduces the concept of effective depth, differentiates it from nominal depth, and demonstrates its importance in understanding CNN trainability and scalability.
Findings
Identity shortcuts and branching modules decouple effective depth from nominal depth.
Effective depth better predicts model scaling potential and trainability.
Architectural topology is more crucial than layer count for gradient health.
Abstract
This paper investigates the relationship between convolutional neural network (CNN) topology and image recognition performance through a comparative study of the VGG, ResNet, and GoogLeNet architectural families. Utilizing a unified experimental framework, the study isolates the impact of depth from confounding implementation variables. A formal distinction is introduced between nominal depth (), representing the physical layer count, and effective depth (), an operational metric quantifying the expected number of sequential transformations. Empirical results demonstrate that architectures utilizing identity shortcuts or branching modules maintain optimization stability by decoupling from . These findings suggest that effective depth serves as a superior framework for predicting scaling potential and practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
