Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements   Translate to Improvements on Similar Datasets?

Utku Ozbulak; Esla Timothy Anzaku; Solha Kang; Wesley De Neve; Joris; Vankerschaver

arXiv:2501.15431·cs.CV·January 28, 2025

Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets?

Utku Ozbulak, Esla Timothy Anzaku, Solha Kang, Wesley De Neve, Joris, Vankerschaver

PDF

TL;DR

This paper investigates whether marginal improvements of self-supervised learning models on ImageNet translate to similar gains on related datasets, revealing that high ImageNet performance does not always imply robustness across datasets.

Contribution

The study evaluates 12 SSL frameworks across multiple ImageNet variants, highlighting the inconsistency of model performance and advocating for more comprehensive benchmarking methods.

Findings

01

Models like DINO and Swav show performance drops on similar datasets.

02

MoCo and Barlow Twins maintain more stable performance.

03

Unified metrics can improve benchmarking fairness.

Abstract

Machine learning (ML) research strongly relies on benchmarks in order to determine the relative effectiveness of newly proposed models. Recently, a number of prominent research effort argued that a number of models that improve the state-of-the-art by a small margin tend to do so by winning what they call a "benchmark lottery". An important benchmark in the field of machine learning and computer vision is the ImageNet where newly proposed models are often showcased based on their performance on this dataset. Given the large number of self-supervised learning (SSL) frameworks that has been proposed in the past couple of years each coming with marginal improvements on the ImageNet dataset, in this work, we evaluate whether those marginal improvements on ImageNet translate to improvements on similar datasets or not. To do so, we investigate twelve popular SSL frameworks on five ImageNet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Softmax · Residual Connection · InfoNCE · Vision Transformer · self-DIstillation with NO labels