Mitigating severe over-parameterization in deep convolutional neural networks through forced feature abstraction and compression with an entropy-based heuristic
Nidhi Gowdra, Roopak Sinha, Stephen MacDonell, Wei Qi Yan

TL;DR
This paper introduces an entropy-based heuristic to limit CNN depth, reducing over-parameterization and training time while maintaining or improving accuracy across multiple datasets and architectures.
Contribution
The proposed EBCLE heuristic effectively constrains CNN depth based on data entropy, enhancing resource utilization and model efficiency without performance loss.
Findings
Reduces training time by up to 78.59%
Maintains or improves classification accuracy
Effective across multiple datasets and architectures
Abstract
Convolutional Neural Networks (CNNs) such as ResNet-50, DenseNet-40 and ResNeXt-56 are severely over-parameterized, necessitating a consequent increase in the computational resources required for model training which scales exponentially for increments in model depth. In this paper, we propose an Entropy-Based Convolutional Layer Estimation (EBCLE) heuristic which is robust and simple, yet effective in resolving the problem of over-parameterization with regards to network depth of CNN model. The EBCLE heuristic employs a priori knowledge of the entropic data distribution of input datasets to determine an upper bound for convolutional network depth, beyond which identity transformations are prevalent offering insignificant contributions for enhancing model performance. Restricting depth redundancies by forcing feature compression and abstraction restricts over-parameterization while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPointwise Convolution · Depthwise Convolution · Residual Connection · Grouped Convolution · Global Average Pooling · Sigmoid Activation · Depthwise Separable Convolution · Bottleneck Residual Block · Residual Block · Kaiming Initialization
