A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions

Tiexin Ding

arXiv:2605.18898·cs.LG·May 20, 2026

A Two-Parameter Weibull Framework for Diagnosing Transformer Weight Distributions

Tiexin Ding

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a Weibull distribution-based diagnostic framework for analyzing transformer weight distributions, revealing distinct patterns across model components and training stages.

Contribution

It applies a two-parameter Weibull model to transformer weights, providing architecture-independent diagnostics and insights into training dynamics.

Findings

01

FFN modules and output projections have a narrow Weibull shape parameter range.

02

Attention input projections deviate from Weibull, influenced by storage methods.

03

Shape parameter lambda increases during training, correlating with training hyperparameters.

Abstract

We apply the Weibull distribution -- a two-parameter family from extreme-value theory -- as a diagnostic framework for element-wise weight magnitude distributions in transformers. At initialization, i.i.d. Gaussian weights give |w| ~ HalfNormal, yielding k ~ 1.20 via middle-80% probability-plot fit (the protocol used throughout this work). This anchor makes k a principled, architecture-independent measuring stick for training dynamics; fitting each weight matrix independently at every layer at every checkpoint enables per-component, per-layer, and per-step diagnostics that aggregate statistics cannot resolve. Applying this framework to 12 model entries spanning 7 architectural families (Pythia, OLMo-1/2, LLaMA-3, Mistral, Qwen2.5/3) reveals three findings. First, FFN modules and the attention output projection W_o -- the Transmission Class -- fall in a narrow k band: median terminal k…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tiexinding/NPM-Weibull-public
github

Datasets

TiexinDing/NPM-Weibull-DATABASE-v9_1
dataset· 38 dl
38 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.