AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

Zhiyuan Li; Yuan Wu; Yi Chang

arXiv:2601.11864·cs.LG·January 21, 2026

AGGC: Adaptive Group Gradient Clipping for Stabilizing Large Language Model Training

Zhiyuan Li, Yuan Wu, Yi Chang

PDF

Open Access

TL;DR

AGGC introduces an adaptive, group-wise gradient clipping method that stabilizes large language model training by addressing gradient heterogeneity, outperforming traditional methods and enhancing model accuracy and stability.

Contribution

The paper proposes AGGC, a novel adaptive group-wise gradient clipping technique that partitions parameters into functional groups and regulates them based on historical behavior, improving training stability.

Findings

01

AGGC outperforms LoRA and often surpasses full fine-tuning.

02

On GSM8K, Mistral-7B with AGGC achieves 72.93% accuracy.

03

AGGC stabilizes RLVR and enhances logic deduction in LLMs.

Abstract

To stabilize the training of Large Language Models (LLMs), gradient clipping is a nearly ubiquitous heuristic used to alleviate exploding gradients. However, traditional global norm clipping erroneously presupposes gradient homogeneity across different functional modules, leading to an adverse "spill-over" effect where volatile parameters force unnecessary scaling on stable ones. To overcome this, we propose Adaptive Group-wise Gradient Clipping (AGGC). AGGC partitions parameters into groups based on functional types and regulates each according to its historical behavior using an Exponential Moving Average (EMA). Specifically, it constructs an adaptive interval to simultaneously mitigate gradient explosion and vanishing, while employing a time-dependent scheduling mechanism to balance exploration and convergence. Experiments on LLaMA 2-7B, Mistral-7B, and Gemma-7B models show that AGGC…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis