A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

Lei Dong

arXiv:2605.18933·cs.LG·May 20, 2026

A Geometric Analysis of Sign-Magnitude Asymmetry in a ReLU + RMSNorm Block under Ternary Quantization

Lei Dong

PDF

TL;DR

This paper provides a geometric explanation for sign-magnitude asymmetry in ReLU + RMSNorm models under ternary quantization, revealing how sign flips affect output energy and model sensitivity.

Contribution

It introduces a sign-magnitude decomposition analysis explaining the observed asymmetry and quantization error effects in pre-norm Transformer models.

Findings

01

Sign-flips produce 2.75 times more transverse energy than sign-preserving perturbations as flip rate approaches zero.

02

ReLU approximately preserves ternary quantization error, making it transparent to sign-magnitude perturbations.

03

Experimental results on TinyLlama-1.1B confirm theoretical predictions about sign sensitivity and outlier effects.

Abstract

Pre-norm Transformers with RMSNorm tolerate ternary {-1,0,+1} weight quantization with surprisingly small loss (Ma et al., 2024). We give a geometric explanation via sign-magnitude decomposition of weight perturbations. In a two-layer ReLU + RMSNorm model with i.i.d. Gaussian weights, sign-flips produce $π / (π - 2) \approx 2.75$ times more transverse output energy than sign-preserving magnitude perturbations of equal Frobenius norm, as the flip rate $p \to 0$ (Theorem 3). The mechanism: ReLU creates a hidden-space directional asymmetry between the two perturbation types, which RMSNorm's transverse-projection Fr\'echet derivative selectively exposes. Sign-quantization error is itself a sign-preserving perturbation with angular alignment $cos^{2} \to 2/ π$ (Theorem 4); its post-ReLU radial fraction ( $0.365$ ) matches the pre-ReLU value $1 - 2/ π$ within $0.4%$ , so ReLU is approximately…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.