Maximizing Incremental Information Entropy for Contrastive Learning
Jiansong Zhang, Zhuoqin Yang, Xu Wu, Xiaoling Luo, Peizhong Liu, Linlin Shen

TL;DR
This paper introduces IE-CL, a contrastive learning framework that maximizes incremental information entropy between augmented views, improving performance especially in small-batch settings by explicitly optimizing entropy gain while maintaining semantic consistency.
Contribution
It proposes a novel entropy-based contrastive learning method that explicitly optimizes entropy gain and integrates theoretical insights with practical improvements.
Findings
IE-CL improves performance on CIFAR-10/100, STL-10, and ImageNet.
Core modules can be integrated into existing frameworks.
Effective in small-batch training scenarios.
Abstract
Contrastive learning has achieved remarkable success in self-supervised representation learning, often guided by information-theoretic objectives such as mutual information maximization. Motivated by the limitations of static augmentations and rigid invariance constraints, we propose IE-CL (Incremental-Entropy Contrastive Learning), a framework that explicitly optimizes the entropy gain between augmented views while preserving semantic consistency. Our theoretical framework reframes the challenge by identifying the encoder as an information bottleneck and proposes a joint optimization of two components: a learnable transformation for entropy generation and an encoder regularizer for its preservation. Experiments on CIFAR-10/100, STL-10, and ImageNet demonstrate that IE-CL consistently improves performance under small-batch settings. Moreover, our core modules can be seamlessly…
Peer Reviews
Decision·ICLR 2026 Poster
1. The proposed formulation leads to a new perspective on entropy control in SSL. Unlike prior works (e.g., InfoMax-SSL, VICReg, Matrix-IB) that maximize representation-level entropy, IE-CL proposes to inject entropy at the input level through a learnable augmentation mechanism. This shift from output-space to input-space entropy control is novel and conceptually meaningful. 2. The proposed method is established based on sound theoretical motivation. The information-theoretic derivation connecti
1. The claim that encoder preservation is necessary appears overstated. The paper argues that spectral normalization of the encoder is required to prevent the loss of generated entropy (Section 3.3). However, Table 3 shows that removing this component leads to only a marginal change in performance (0.26 percent difference). This result indicates that the encoder likely already preserves entropy through existing normalization layers and the contrastive objective itself. Therefore, the Encoder Pre
I think the paper has some strengths: First, I think the theoretical foundation is a major plus. The authors provide a proof for their core claim, linking the minimization of contrastive loss to the maximization of incremental entropy. In my opinion, this reframing of contrastive learning as a trade-off between entropy expansion and semantic alignment is a novel information-theoretic perspective. Also, I think the SAIB module itself is a contribution. It's not just an arbitrary layer but a lig
I think the paper's primary weakness is its narrow experiment, which doesn't fully support the broad claims of a improved "framework." My main issue is that all experiments are confined to ResNet architectures. The self-supervised learning field has largely migrated to Vision Transformers (ViTs), and their complete absence here is a glaring omission. In my opinion, I feel this makes the work somewhat dated and raises a critical question: is this incremental entropy principle a general SSL concep
- The approach is interesting, particularly in introducing the concept of entropy into self-supervised learning through SAIB and IncEntropy. In particular, SAIB presents an appealing way to apply entropy, showing the largest performance gain in the ablation study (Table 3). - The method is also simple and can be easily integrated into existing SSL frameworks, as demonstrated in Table 5.
1. Although the effectiveness of the proposed method is demonstrated throughout the paper, most experiments (except Table 1) are conducted on relatively small-scale settings such as ResNet-18 or ImageNet-100. Since Table 3 highlights the strong effect of entropy generation through SAIB, it would be valuable to evaluate the method on larger or standard-scale benchmarks. The same applies to Table 5. 2. It would also be helpful to include an ablation study with semantic consistency only, to better
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis
