AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

Di He; Songjun Tu; Ajay Jaiswal; Li Shen; Ganzhao Yuan; Shiwei Liu; Lu Yin

arXiv:2506.14562·cs.CL·November 6, 2025

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

Di He, Songjun Tu, Ajay Jaiswal, Li Shen, Ganzhao Yuan, Shiwei Liu, Lu Yin

PDF

Open Access 1 Repo

TL;DR

AlphaDecay introduces a module-wise adaptive weight decay method for LLMs, guided by spectral analysis, to improve training stability and performance over uniform decay approaches.

Contribution

It proposes a novel adaptive weight decay technique based on spectral properties, enhancing LLM training by balancing module-specific regularization.

Findings

01

AlphaDecay outperforms uniform decay in perplexity and generalization.

02

The method is effective across models from 60M to 1B parameters.

03

Spectral analysis guides optimal decay strength assignment.

Abstract

Weight decay is a standard regularization technique for training large language models (LLMs). While it is common to assign a uniform decay rate to every layer, this approach overlooks the structural diversity of LLMs and the varying spectral properties across modules. In this paper, we introduce AlphaDecay, a simple yet effective method that adaptively assigns different weight decay strengths to each module of an LLM. Our approach is guided by Heavy-Tailed Self-Regularization (HT-SR) theory, which analyzes the empirical spectral density (ESD) of weight correlation matrices to quantify "heavy-tailedness." Modules exhibiting more pronounced heavy-tailed ESDs, reflecting stronger feature learning, are assigned weaker decay, while modules with lighter-tailed spectra receive stronger decay. Our method leverages tailored weight decay assignments to balance the module-wise differences in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hed-ucas/alphadecay
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMining Techniques and Economics