It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs

Jun Wu; Patrick Huang; Jiangtao Wen; Yuxing Han

arXiv:2506.00486·cs.LG·February 24, 2026

It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs

Jun Wu, Patrick Huang, Jiangtao Wen, Yuxing Han

PDF

Open Access

TL;DR

This paper introduces a statistical modeling approach using generalized Gaussian distributions to improve initialization, training efficiency, and communication cost reduction in large language models, resulting in faster, smaller, and more efficient models.

Contribution

It proposes a GG-based initialization, ACT training method, and GCT gradient constraint algorithm, advancing scalable and hardware-aware LLM training.

Findings

01

Models are well modeled by generalized Gaussian distributions.

02

The proposed methods lead to smaller, faster models with minimal communication overhead.

03

Experiments show improved convergence and accuracy across architectures.

Abstract

Despite rapid progress in large language models (LLMs), the statistical structure of their weights, activations, and gradients-and its implications for initialization, training dynamics, and efficiency-remains largely unexplored. We empirically show that these quantities in LLMs are well modeled by generalized Gaussian (GG) distributions, and introduce a unified, end-to-end optimization framework grounded in this observation. Our contributions are threefold: (1) a GG-based initialization that aligns with trained model statistics, accelerating convergence and improving accuracy; (2) ACT, a progressive activation-constrained training method that reduces redundancy and propagation overhead; and (3) GCT, a gradient-constrained training algorithm that substantially lowers communication cost in distributed training. Experiments across diverse architectures demonstrate consistently smaller,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education