On Learnable Parameters of Optimal and Suboptimal Deep Learning Models
Ziwei Zheng, Huizhi Liang, Vaclav Snasel, Vito Latora, Panos Pardalos,, Giuseppe Nicosia, Varun Ojha

TL;DR
This paper investigates the statistical properties of learnable parameters in deep learning models, revealing that successful networks share similar weight characteristics across different architectures and datasets, while poor-performing ones differ.
Contribution
It provides an empirical analysis linking weight statistics and distributions to network performance across diverse models and datasets, highlighting commonalities in successful networks.
Findings
Successful networks have similar weight statistics regardless of architecture.
Poor-performing networks show more variation in weight distributions.
Learnable parameters exhibit similar learning characteristics across DNN, CNN, and ViT.
Abstract
We scrutinize the structural and operational aspects of deep learning models, particularly focusing on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization. By establishing correlations between variance in weight patterns and overall network performance, we investigate the varying (optimal and suboptimal) performances of various deep-learning models. Our empirical analysis extends across widely recognized datasets such as MNIST, Fashion-MNIST, and CIFAR-10, and various deep learning models such as deep neural networks (DNNs), convolutional neural networks (CNNs), and vision transformer (ViT), enabling us to pinpoint characteristics of learnable parameters that correlate with successful networks. Through extensive experiments on the diverse architectures of deep learning models, we shed light on the critical factors that influence the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsSoftmax · Linear Layer · Residual Connection · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Dense Connections · Vision Transformer
