IBNorm: Information-Bottleneck Inspired Normalization for Representation Learning

Xiandong Zou; Jia Li; Xiaotong Yuan; Pan Zhou

arXiv:2510.25262·cs.LG·January 30, 2026

IBNorm: Information-Bottleneck Inspired Normalization for Representation Learning

Xiandong Zou, Jia Li, Xiaotong Yuan, Pan Zhou

PDF

3 Reviews

TL;DR

IBNorm is a normalization technique inspired by the Information Bottleneck principle that enhances the informativeness of learned representations while maintaining training stability, outperforming traditional variance-centric methods.

Contribution

This paper introduces IBNorm, a novel normalization method based on the Information Bottleneck, which improves representation quality by balancing information preservation and suppression.

Findings

01

IBNorm achieves higher IB values than variance-centric methods.

02

Empirical results show IBNorm outperforms BatchNorm, LayerNorm, and RMSNorm.

03

Mutual information analysis confirms better information bottleneck behavior.

Abstract

Normalization is fundamental to deep learning, but existing approaches such as BatchNorm, LayerNorm, and RMSNorm are variance-centric by enforcing zero mean and unit variance, stabilizing training without controlling how representations capture task-relevant information. We propose IB-Inspired Normalization (IBNorm), a simple yet powerful family of methods grounded in the Information Bottleneck principle. IBNorm introduces bounded compression operations that encourage embeddings to preserve predictive information while suppressing nuisance variability, yielding more informative representations while retaining the stability and compatibility of standard normalization. Theoretically, we prove that IBNorm achieves a higher IB value and tighter generalization bounds than variance-centric methods. Empirically, IBNorm consistently outperforms BatchNorm, LayerNorm, and RMSNorm across…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper introduces an application of the information bottleneck to normalization. By integrating compression operations into the normalization process, the paper presents a significant departure from traditional methods that focus solely on statistical normalization (mean and variance). 2. The paper presents a clear theoretical foundation and solid empirical validation. The authors present rigorous proofs that IBNorm achieves a higher IB value compared to variance-centric methods and demons

Weaknesses

1. Limited contribution in context of existing IB works. In most work based on the Information Bottleneck principle, compression operations (such as nonlinear functions like tanh [1] and kernel-based function [2], or explicit compression losses like VIB [3] and SPC [4]) are often applied alongside normalization operations within neural networks. The paper claims to introduce an IB perspective to normalization, but it essentially introduces an additional compression operation into networks alread

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper provides an theoretical viewpoint by integrating the information bottleneck principle into normalization design, even though inspired from the previous paper (NormalNorm, ICML 2025), contributing to a deeper understanding of how normalization can regulate information flow in representation learning. 2. The information-theoretic reformulation of normalization is well motivated, and the overall structure and derivation are easy to follow. 3. Empirical validation on multiple model

Weaknesses

1. The overall idea and contribution is incremental, especially the writing is mostly following the presentation of the previous paper (NormalNorm, ICML 2025), e.g., the Information Bottleneck idea of normalization and the description of algorithm (NormalNorm, ICML 2025). Besides, this paper misses a bunch of papers to discuss and compare., e.g., the whitening method in normal supervised learning [1,2] and self-superverses learning [3,4]. I think this paper should compare to IterNorm [2] meth

Reviewer 03Rating 2Confidence 4

Strengths

* The paper clearly identifies a potential limitation of existing normalization methods (focusing solely on moments) and proposes addressing it through the principled lens of the Information Bottleneck. * The method is evaluated across different domains (NLP, CV), architectures (Transformers, CNNs), and model scales, providing a broad assessment of its practical performance. * IBNorm seems to outperform standard baselines (LN, BN, RMSNorm) and a relevant recent competitor (NormalNorm) across mos

Weaknesses

The paper's primary weaknesses lie in the lack of sound theoretical justification for its central claims and unfair empirical comparisons in the vision domain. 1. **Unfair Empirical Comparison in Vision Experiments:** The claimed empirical superiority in vision models (Table 3) is based on an unfair comparison. Appendix F.2 reveals that the authors used different hyperparameters (learning rates, weight decays) for their proposed IBNorm method compared to the baseline methods (including BatchNo

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.