Weighted Ensemble Self-Supervised Learning
Yangjun Ruan, Saurabh Singh, Warren Morningstar, Alexander A. Alemi,, Sergey Ioffe, Ian Fischer, Joshua V. Dillon

TL;DR
This paper introduces a data-dependent weighted ensemble framework for self-supervised learning that enhances performance and robustness without additional computational costs, demonstrated on state-of-the-art SSL methods on ImageNet-1K.
Contribution
It proposes a novel ensemble method that improves SSL performance by weighting ensemble heads based on data, without changing model architecture or increasing training overhead.
Findings
Outperforms individual SSL methods like DINO and MSN on ImageNet-1K.
Improves few-shot learning performance, e.g., 3.9 percentage points in 1-shot MSN ViT-B/16.
Increasing diversity among ensemble heads enhances downstream results.
Abstract
Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framework that permits data-dependent weighted cross-entropy losses. We refrain from ensembling the representation backbone; this choice yields an efficient ensemble method that incurs a small training cost and requires no architectural changes or computational overhead to downstream evaluation. The effectiveness of our method is demonstrated with two state-of-the-art SSL methods, DINO (Caron et al., 2021) and MSN (Assran et al., 2022). Our method outperforms both in multiple evaluation metrics on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer
