Diminishing Returns in Self-Supervised Learning

Oli Bridge; Huey Sun; Botond Branyicskai-Nagy; Charles D'Ornano; Shomit Basu

arXiv:2512.03862·cs.CV·January 6, 2026

Diminishing Returns in Self-Supervised Learning

Oli Bridge, Huey Sun, Botond Branyicskai-Nagy, Charles D'Ornano, Shomit Basu

PDF

Open Access

TL;DR

This paper investigates how self-supervised pre-training and fine-tuning affect small vision transformers, revealing diminishing returns with more supervision and negative impacts of intermediate classification stages on dense prediction tasks.

Contribution

It demonstrates that in low-capacity models, the geometry of supervision is crucial, and intermediate classification fine-tuning can harm performance by disrupting learned representations.

Findings

01

Pre-training and downstream fine-tuning improve performance but with diminishing returns.

02

Intermediate classification fine-tuning degrades downstream performance, especially where pre-training is effective.

03

Misaligned supervision objectives can negate pre-training benefits by collapsing spatial representations.

Abstract

Transformer-based architectures have become a dominant paradigm in vision and language, but their success is often attributed to large model capacity and massive training data. In this work, we examine how self-supervised pre-training, intermediate fine-tuning, and downstream fine-tuning interact in a low-capacity regime, using a 5M-parameter Vision Transformer for semantic segmentation. Across multiple data scales, we find that masked image modeling pre-training and downstream fine-tuning reliably improve performance, but with clear diminishing returns as supervision increases. In contrast, inserting an intermediate classification fine-tuning stage consistently degrades downstream performance, with the largest drops occurring precisely where pre-training is most effective. Through an analysis of patch-level representation geometry, we show that classification-based intermediate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis