Self-Supervised Learning with a Multi-Task Latent Space Objective

Pierre-Fran\c{c}ois De Plaen; Abhishek Jha; Luc Van Gool; Tinne Tuytelaars; Marc Proesmans

arXiv:2602.05845·cs.CV·February 6, 2026

Self-Supervised Learning with a Multi-Task Latent Space Objective

Pierre-Fran\c{c}ois De Plaen, Abhishek Jha, Luc Van Gool, Tinne Tuytelaars, Marc Proesmans

PDF

Open Access

TL;DR

This paper introduces a multi-task, multi-view self-supervised learning framework that stabilizes training with multi-crop strategies by assigning separate predictors and combining various view types, leading to improved image representations.

Contribution

It proposes a novel multi-task Siamese SSL method that stabilizes multi-crop training and integrates multiple view types for enhanced visual representation learning.

Findings

01

Stable training across backbones like ResNet and ViT.

02

Consistent performance improvements on ImageNet.

03

Effective integration of global, local, and masked views.

Abstract

Self-supervised learning (SSL) methods based on Siamese networks learn visual representations by aligning different views of the same image. The multi-crop strategy, which incorporates small local crops to global ones, enhances many SSL frameworks but causes instability in predictor-based architectures such as BYOL, SimSiam, and MoCo v3. We trace this failure to the shared predictor used across all views and demonstrate that assigning a separate predictor to each view type stabilizes multi-crop training, resulting in significant performance gains. Extending this idea, we treat each spatial transformation as a distinct alignment task and add cutout views, where part of the image is masked before encoding. This yields a simple multi-task formulation of asymmetric Siamese SSL that combines global, local, and masked views into a single framework. The approach is stable, generally applicable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis