When Does Margin Clamping Affect Training Variance? Dataset-Dependent Effects in Contrastive Forward-Forward Learning

Joshua Steier

arXiv:2603.00951·cs.LG·March 3, 2026

When Does Margin Clamping Affect Training Variance? Dataset-Dependent Effects in Contrastive Forward-Forward Learning

Joshua Steier

PDF

Open Access

TL;DR

This paper investigates how margin clamping in contrastive learning affects training variance, revealing dataset-dependent effects and proposing a gradient-neutral alternative to reduce variance inflation.

Contribution

It demonstrates that margin clamping can increase training variance on CIFAR-10 due to saturation effects and introduces a gradient-neutral formulation to mitigate this issue.

Findings

01

Margin clamping increases variance on CIFAR-10 but not on other datasets.

02

Saturation-driven gradient truncation occurs at early layers.

03

Switching to a gradient-neutral margin reduces variance without harming accuracy.

Abstract

Contrastive Forward-Forward (CFF) learning trains Vision Transformers layer by layer against supervised contrastive objectives. CFF training can be sensitive to random seed, but the sources of this instability are poorly understood. We focus on one implementation detail: the positive-pair margin in the contrastive loss is applied through saturating similarity clamping, $min (s + m, 1)$ . We prove that an alternative formulation, subtracting the margin after the log-probability, is gradient-neutral under the mean-over-positives reduction. On CIFAR-10 ( $2 \times 2$ factorial, $n = 7$ seeds per cell), clamping produces $5.90 \times$ higher pooled test-accuracy variance ( $p = 0.003$ ) with no difference in mean accuracy. Analyses of clamp activation rates, layerwise gradient norms, and a reduced-margin probe point to saturation-driven gradient truncation at early layers. The effect does not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques