Measuring Maximum Activations in Open Large Language Models

Luxuan Chen; Han Tian; Xinran Chen; Rui Kong; Fang Wang; Jiamin Chen; Yuchen Li; Jiashu Zhao; Shuaiqiang Wang; Haoyi Xiong; Dawei Yin

arXiv:2605.15572·cs.CL·May 18, 2026

Measuring Maximum Activations in Open Large Language Models

Luxuan Chen, Han Tian, Xinran Chen, Rui Kong, Fang Wang, Jiamin Chen, Yuchen Li, Jiashu Zhao, Shuaiqiang Wang, Haoyi Xiong, Dawei Yin

PDF

1 Repo

TL;DR

This paper systematically measures and compares the maximum activation magnitudes across various open large language models, revealing significant variability and architectural influences that impact low-bit quantization and deployment.

Contribution

It provides a comprehensive, unified measurement pipeline for activation maxima in modern open LLMs, highlighting their dependence on model family, architecture, and training stage.

Findings

01

Global maxima vary over four orders of magnitude across models.

02

Cross-family and cross-generation comparisons do not follow simple scaling laws.

03

MoE models have significantly lower activation peaks than dense models.

Abstract

The dynamic range of activations is a first-order constraint for low-bit quantization, activation scaling, and stable LLM inference. Prior work characterized outlier features and massive activations on pre-2024 LLaMA-style models, and the downstream activation-quantization stack inherits that picture without revisiting it for the post-LLaMA open-model boom. We ask the deployment-oriented question: how large can activations get in modern open LLMs, and how does this magnitude vary across families, generations, and training stages? Under a unified pipeline (5,000-sample multi-domain corpus, family-specific tokenization, identical hooks across embeddings, hidden states, attention, MLP/MoE, SwiGLU gates, and final norm), we measure global and layerwise maxima on 27 checkpoints from 8 open families spanning dense, MoE, vision-language, intermediate-training, and instruction-tuned variants.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clx1415926/Max_act_llm
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.