Diminishing Returns in Self-Supervised Learning
Oli Bridge, Huey Sun, Botond Branyicskai-Nagy, Charles D'Ornano, Shomit Basu

TL;DR
This paper investigates how self-supervised pre-training and fine-tuning affect small vision transformers, revealing diminishing returns with more supervision and negative impacts of intermediate classification stages on dense prediction tasks.
Contribution
It demonstrates that in low-capacity models, the geometry of supervision is crucial, and intermediate classification fine-tuning can harm performance by disrupting learned representations.
Findings
Pre-training and downstream fine-tuning improve performance but with diminishing returns.
Intermediate classification fine-tuning degrades downstream performance, especially where pre-training is effective.
Misaligned supervision objectives can negate pre-training benefits by collapsing spatial representations.
Abstract
Transformer-based architectures have become a dominant paradigm in vision and language, but their success is often attributed to large model capacity and massive training data. In this work, we examine how self-supervised pre-training, intermediate fine-tuning, and downstream fine-tuning interact in a low-capacity regime, using a 5M-parameter Vision Transformer for semantic segmentation. Across multiple data scales, we find that masked image modeling pre-training and downstream fine-tuning reliably improve performance, but with clear diminishing returns as supervision increases. In contrast, inserting an intermediate classification fine-tuning stage consistently degrades downstream performance, with the largest drops occurring precisely where pre-training is most effective. Through an analysis of patch-level representation geometry, we show that classification-based intermediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
