TL;DR
This paper critically examines the use of InfoNCE for mutual information estimation, introduces a corrected estimator called InfoNCE-anchor, and unifies various contrastive objectives under a single framework, revealing insights into their effectiveness.
Contribution
It presents a new, bias-reduced MI estimator called InfoNCE-anchor and a unified theoretical framework for contrastive objectives using proper scoring rules.
Findings
InfoNCE is not a valid MI estimator.
InfoNCE-anchor achieves more accurate MI estimates.
Contrastive learning improves downstream tasks through structured density ratios, not MI accuracy.
Abstract
The InfoNCE objective, originally introduced for contrastive representation learning, has become a popular choice for mutual information (MI) estimation, despite its indirect connection to MI. In this paper, we demonstrate why InfoNCE should not be regarded as a valid MI estimator, and we introduce a simple modification, which we refer to as InfoNCE-anchor, for accurate MI estimation. Our modification introduces an auxiliary anchor class, enabling consistent density ratio estimation and yielding a plug-in MI estimator with significantly reduced bias. Beyond this, we generalize our framework using proper scoring rules, which recover InfoNCE-anchor as a special case when the log score is employed. This formulation unifies a broad spectrum of contrastive objectives, including NCE, InfoNCE, and -divergence variants, under a single principled framework. Empirically, we find that…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper is clearly written with precise formulations. 2. The theoretical analysis provides a sharp upper bound on InfoNCE via $K$-way JS divergence, clarifying its high bias. 3. The unified framework through proper scoring rules elegantly connects NCE, InfoNCE, and f-divergence variants.
1. Theorem 2 assumes known distributions $q_1$ and $q_0$ for equality conditions, but in practice with neural critics of finite capacity, the proportionality $r_θ$ ∝ $\frac{q_1}{q_0}$ may not hold, leaving gaps in how approximation errors affect the bound's tightness. 2. Critique of α-InfoNCE in Section 3.3 claims a proof flaw without supplying a counterexample or alternative derivation. 3. The extension to proper scoring rules claims consistency for class probability estimation, but this paper
1. The analysis of InfoNCE's limitations is sharp and well-motivated, with a tight bound on its divergence (Theorem 2) that clarifies why it underestimates MI even for large K. The anchor modification is elegant and directly addresses the identifiability issue in density ratio estimation (Theorem 3). The generalization to proper scoring rules is a nice unification, recovering existing methods as special cases while providing a principled decision-theoretic foundation. 2. Strong results in MI est
1. While the SSL experiments are thorough, they are restricted to CIFAR-100 with a ResNet-18 backbone. It would be valuable to test on larger datasets or architectures (e.g., ViTs) to confirm if the lack of improvement holds more generally. 2. The choice of ν=1 is defaulted without extensive tuning; sensitivity analysis (e.g., ν vs. performance) could reveal trade-offs, especially since asymptotic behavior links ν/K to bounds like DV/NWJ. 3. The anchor introduces an extra term, potentially incre
- This study presents a theoretically grounded method to enhance existing mutual information (MI) estimation techniques and provides empirical evidence demonstrating its effectiveness. - The proposed “anchor” modification is straightforward yet addresses a subtle theoretical issue in density ratio identifiability. - The MI estimation experiments are comprehensive and show consistent advantages of the proposed method across different domains.
- Although theoretically neat, the proposed modification brings no tangible improvement to representation learning — arguably the main motivation for contrastive objectives. - Only a few relatively simple contrastive methods are considered; comparison with modern frameworks would strengthen the practical side.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
