Beyond Random Augmentations: Pretraining with Hard Views
Fabio Ferreira, Ivo Rapant, J\"org K. H. Franke, Frank Hutter

TL;DR
This paper introduces Hard View Pretraining (HVP), a simple method that improves self-supervised learning by explicitly selecting more challenging views during pretraining, leading to state-of-the-art results on ImageNet-1k.
Contribution
HVP is the first scalable hard view selection method that enhances SSL pretraining across multiple models and datasets, achieving new performance benchmarks.
Findings
HVP improves linear evaluation accuracy on ImageNet-1k by 0.6%.
HVP yields consistent 1% gains across various SSL methods and training epochs.
HVP demonstrates effectiveness at scale on full ImageNet-1k dataset.
Abstract
Self-Supervised Learning (SSL) methods typically rely on random image augmentations, or views, to make models invariant to different transformations. We hypothesize that the efficacy of pretraining pipelines based on conventional random view sampling can be enhanced by explicitly selecting views that benefit the learning progress. A simple yet effective approach is to select hard views that yield a higher loss. In this paper, we propose Hard View Pretraining (HVP), a learning-free strategy that extends random view generation by exposing models to more challenging samples during SSL pretraining. HVP encompasses the following iterative steps: 1) randomly sample multiple views and forward each view through the pretrained model, 2) create pairs of two views and compute their loss, 3) adversarially select the pair yielding the highest loss according to the current model state, and 4) perform…
Peer Reviews
Decision·ICLR 2025 Poster
- HVP introduces a straightforward, loss-based hard view selection mechanism that enhances SSL training without requiring additional components or extensive hyperparameter tuning. This simplicity makes it highly practical for integration into existing SSL pipelines. - The paper presents extensive experiments across different SSL frameworks, model architectures (e.g., CNNs and Vision Transformers), and datasets. This broad evaluation supports the generalizability of HVP and its effectiveness in i
1. HVP’s reliance on high-loss pair selection may result in false positive pairs (i.e., views from different instances within the same image) being chosen, which could hinder representation learning. The paper does not clearly address whether the current HVP method can effectively avoid or mitigate this issue. 2. The related work section does not thoroughly discuss other existing view construction methods such as [1,2,3,4] nor does it compare HVP with these methods experimentally. The absence of
1. Innovative approach: A new self supervised learning pre training method HVP has been proposed, which improves the model's generalization ability by selecting difficult views. This is a novel research direction. 2. Wide applicability: The HVP method is not only applicable to one SSL method, but can be integrated into various popular SSL frameworks such as SimSiam, DINO, iBOT, and SimCLR, demonstrating good compatibility. 3. Significant performance improvement: HVP has shown better performanc
Although this article proposes a promising self supervised learning pre training method HVP and demonstrates its effectiveness on multiple tasks, there are also some potential shortcomings: 1. Computational cost: The HVP method requires additional forward propagation to select the most difficult view pairs, which may increase the computational cost of training, especially on large-scale datasets and complex models. Please try to compare the computational cost of proposed approach and existing o
The paper is well written, so many experiments have been done. The method is simple and easy to incorporate in existing SSL pipelines, maybe seen as a plug-n-play method.
When we choose hardest views based on loss value, then it is certainly encouraging few augmentation strategies over others which defies the purpose of randomness of augmentations. So, it may results performance improvement on unseen example (validation set) of the dataset ion which model is trained however, generalizability of model become questionable in reference to domain adaptation. Initially the model parameters are not effective, therefore, the higher loss may not be a good indicator o
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research · Multimodal Machine Learning Applications
MethodsBitcoin Customer Service Number +1-833-534-1729 · Multi-Head Attention · Attention Is All You Need · Batch Normalization · 1x1 Convolution · Average Pooling · Bottleneck Residual Block · Convolution · Residual Connection · Layer Normalization
