Variance-Covariance Regularization Improves Representation Learning
Jiachen Zhu, Katrina Evtimova, Yubei Chen, Ravid Shwartz-Ziv, Yann, LeCun

TL;DR
This paper introduces Variance-Covariance Regularization (VCReg), a novel self-supervised regularization technique that enhances feature diversity and transferability in supervised learning, leading to state-of-the-art results across various image and video tasks.
Contribution
The paper adapts VICReg's regularization to supervised learning, promoting diverse feature learning and improving transfer learning performance in multiple domains.
Findings
VCReg achieves state-of-the-art transfer performance on image and video tasks.
It improves learning in long-tail and hierarchical classification scenarios.
The method addresses gradient starvation and neural collapse issues.
Abstract
Transfer learning plays a key role in advancing machine learning models, yet conventional supervised pretraining often undermines feature transferability by prioritizing features that minimize the pretraining loss. In this work, we adapt a self-supervised learning regularization technique from the VICReg method to supervised learning contexts, introducing Variance-Covariance Regularization (VCReg). This adaptation encourages the network to learn high-variance, low-covariance representations, promoting learning more diverse features. We outline best practices for an efficient implementation of our framework, including applying it to the intermediate representations. Through extensive empirical evaluation, we demonstrate that our method significantly enhances transfer learning for images and videos, achieving state-of-the-art performance across numerous tasks and datasets. VCReg also…
Peer Reviews
Decision·Submitted to ICLR 2025
1. VCReg repurpose the variance and covariance components of VICReg for supervised settings, intent to make transfer learning without dependance on invariance 2. Improvements are shown when using supervised pretraining on ImageNet and then transferring to other datasets so empirical evidence of presented task/scenarios is largely convincing. 3. presentation of this paper is well-structured with clarity.
1. VICReg's regularization easily integrates with other SSL methods like SimCLR since it operates on dimensions across the batch without interfering with contrastive losses so it should ideally can be integration with SSL and supervised losses also. So Why I see VCReg as an special case of VICReg applied on supervised setting rather than novel approach. 2. Invariance component is removed to streamline VCReg for supervised tasks however this could reduce robustness to data variations. Without the
1.The authors perform extensive experiments across multiple tasks, showcasing the effectiveness of VCReg in diverse settings, including transfer learning for images and videos, long-tailed learning, self-supervised learning, and hierarchical classification. 2.The benefits of VCReg are explored thoroughly in Section 5 empirically, which is both interesting and convincing.
1.The paper lacks a theoretical explanation for how VCReg improves generalization. For instance, can the authors provide a theoretical analysis of VCReg’s impact on the decision boundary or expected risk? A theoretical grounding would clarify VCReg’s influence on generalization and strengthen the methodology. Relevant references for further grounding could include: - Empirical Bernstein Bounds and Sample Variance Penalization by Maurer et al., 2009 - Variance-based Regularization with Convex Obj
1. Writing and motivation are generally clear. 2. The proposed VCReg makes sense as whitening technique has been proved to enhance feature diversity in SL. 3. Extensive experiments are conducted to verify VCReg’s performance in transfer Learning.
1. **Lack of novelty and practicality.** - Firstly, most SSL methods, such as VICReg, DINO and so on, have been verified to significantly outperform supervised learning on downstream transfer learning tasks. No matter what components are added, SL consistently focuses on matching label information. Therefore, in contemporary machine learning, SL is more commonly used for fine-tuning specific tasks rather than serving as a pretraining method for obtaining general-purpose representations. If the p
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI
