On Learnable Parameters of Optimal and Suboptimal Deep Learning Models

Ziwei Zheng; Huizhi Liang; Vaclav Snasel; Vito Latora; Panos Pardalos,; Giuseppe Nicosia; Varun Ojha

arXiv:2408.11720·cs.LG·August 22, 2024

On Learnable Parameters of Optimal and Suboptimal Deep Learning Models

Ziwei Zheng, Huizhi Liang, Vaclav Snasel, Vito Latora, Panos Pardalos,, Giuseppe Nicosia, Varun Ojha

PDF

Open Access

TL;DR

This paper investigates the statistical properties of learnable parameters in deep learning models, revealing that successful networks share similar weight characteristics across different architectures and datasets, while poor-performing ones differ.

Contribution

It provides an empirical analysis linking weight statistics and distributions to network performance across diverse models and datasets, highlighting commonalities in successful networks.

Findings

01

Successful networks have similar weight statistics regardless of architecture.

02

Poor-performing networks show more variation in weight distributions.

03

Learnable parameters exhibit similar learning characteristics across DNN, CNN, and ViT.

Abstract

We scrutinize the structural and operational aspects of deep learning models, particularly focusing on the nuances of learnable parameters (weight) statistics, distribution, node interaction, and visualization. By establishing correlations between variance in weight patterns and overall network performance, we investigate the varying (optimal and suboptimal) performances of various deep-learning models. Our empirical analysis extends across widely recognized datasets such as MNIST, Fashion-MNIST, and CIFAR-10, and various deep learning models such as deep neural networks (DNNs), convolutional neural networks (CNNs), and vision transformer (ViT), enabling us to pinpoint characteristics of learnable parameters that correlate with successful networks. Through extensive experiments on the diverse architectures of deep learning models, we shed light on the critical factors that influence the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSoftmax · Linear Layer · Residual Connection · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Dense Connections · Vision Transformer