On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry

Mohammad Tinati; Stephen Tu

arXiv:2603.27631·cs.LG·March 31, 2026

On the Asymptotics of Self-Supervised Pre-training: Two-Stage M-Estimation and Representation Symmetry

Mohammad Tinati, Stephen Tu

PDF

TL;DR

This paper develops an asymptotic theory for self-supervised pre-training using two-stage M-estimation, addressing symmetry issues with Riemannian geometry, and provides insights into the interaction between pre-training and fine-tuning.

Contribution

It introduces a novel asymptotic framework for pre-training that accounts for representation symmetry and links pre-training to downstream tasks through orbit-invariance.

Findings

01

Derived the limiting distribution of downstream test risk.

02

Applied theory to spectral pre-training, factor models, and Gaussian mixtures.

03

Achieved substantial improvements over prior bounds in specific cases.

Abstract

Self-supervised pre-training, where large corpora of unlabeled data are used to learn representations for downstream fine-tuning, has become a cornerstone of modern machine learning. While a growing body of theoretical work has begun to analyze this paradigm, existing bounds leave open the question of how sharp the current rates are, and whether they accurately capture the complex interaction between pre-training and fine-tuning. In this paper, we address this gap by developing an asymptotic theory of pre-training via two-stage M-estimation. A key challenge is that the pre-training estimator is often identifiable only up to a group symmetry, a feature common in representation learning that requires careful treatment. We address this issue using tools from Riemannian geometry to study the intrinsic parameters of the pre-training representation, which we link with the downstream predictor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.