On the Universality of Self-Supervised Learning
Wenwen Qiang, Jingyao Wang, Changwen Zheng, Hui Xiong, Gang Hua

TL;DR
This paper introduces GeSSL, a new self-supervised learning framework that explicitly models universality in representations, leading to improved generalization and transferability across diverse tasks.
Contribution
It proposes a novel bi-level optimization framework for SSL that explicitly captures universality, supported by theoretical bounds and empirical validation.
Findings
GeSSL outperforms existing SSL methods on multiple benchmarks.
Theoretical analysis guarantees better generalization to unseen tasks.
Explicit modeling of universality improves transferability of learned representations.
Abstract
In this paper, we investigate what constitutes a good representation or model in self-supervised learning (SSL). We argue that a good representation should exhibit universality, characterized by three essential properties: discriminability, generalizability, and transferability. While these capabilities are implicitly desired in most SSL frameworks, existing methods lack an explicit modeling of universality, and its theoretical foundations remain underexplored. To address these gaps, we propose General SSL (GeSSL), a novel framework that explicitly models universality from three complementary dimensions: the optimization objective, the parameter update mechanism, and the learning paradigm. GeSSL integrates a bi-level optimization structure that jointly models task-specific adaptation and cross-task consistency, thereby capturing all three aspects of universality within a unified SSL…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The universality of SSL is theoretically defined, including distinguishability, generalization, and portability. 2. The idea behind GeSSL, which models generality through a two-layer learning paradigm, is interesting. 3. Theoretical and empirical evaluations on benchmark datasets demonstrate the advantages of GeSSL.
1. The baselines compared in this paper mainly focus on classic or state-of-the-art (SOTA) models from 2020, 2021, and 2022, while there are fewer models from 2023 and later, which cannot accurately reflect the true situation of the latest models. 2. The computational efficiency is relatively high, which might hinder the necessity of the proposed method.
1. Developing SSL approaches from first principles is a crucial and ambitious research direction. 2. The proposed bi-level optimization scheme is an intriguing and compelling approach to modeling generalizability. 3. The proposed GeSSL approach possesses strong theoretical foundations. 4. GeSSL improves the performance of the respective methods it is applied to, while reducing the training time. 5. The breadth of the experimental evaluations is impressive.
**Major** 1. The paper applies episodic sampling within a single dataset, with downstream tasks drawn from the same domain. This does not validate cross-task generalization in a meaningful sense, especially compared to standard meta-learning or multi-dataset pretraining studies. The core claim that the method models Transferability remains substantially unproven under the current experimental design. 2. The formulation still depends on conventional design choices (e.g., L_ssl, L_disc), which we
S1. The paper clearly defines what constitutes a good SSL representation (discriminability, generalizability, transferability) and consolidates these into a single goal. This brings conceptual clarity and a new perspective to SSL, shifting focus from “which tricks yield good features” to “what properties should good features have”. By explicitly modeling these properties in the loss, the approach directly targets the end-goal of SSL rather than relying on indirect proxies. S2. The proposed GeSS
W1- Theoretical Analysis for Multi-Paradigm Generality: The paper should include a deeper analysis or explanation of why GeSSL can integrate with different SSL losses. My suggestion is to formalize the idea that each mini-batch forms a pseudo-task where one view acts as a “class prototype” (as hinted in the paper’s discussion of anchors and clustering). If this holds, one could argue that contrastive, distillation, and generative SSL all enforce some form of pairwise consistency that GeSSL lever
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
