Directional Neural Collapse Explains Few-Shot Transfer in Self-Supervised Learning

Achleshwar Luthra; Yash Salunkhe; Tomer Galanti

arXiv:2603.03530·cs.LG·March 5, 2026

Directional Neural Collapse Explains Few-Shot Transfer in Self-Supervised Learning

Achleshwar Luthra, Yash Salunkhe, Tomer Galanti

PDF

Open Access

TL;DR

This paper introduces the concept of directional neural collapse (CDNV) to explain why self-supervised learning representations transfer well with few labels, linking geometric properties to generalization and multitask performance.

Contribution

It provides theoretical bounds based on directional CDNV, links decision-axis collapse to multitask orthogonality, and empirically demonstrates the phenomenon across SSL objectives and synthetic data.

Findings

01

Directional CDNV collapses during pretraining even when classical CDNV remains large.

02

Small directional CDNV leads to nearly orthogonal decision axes across tasks.

03

Bounds closely track few-shot error at practical shot sizes.

Abstract

Frozen self-supervised representations often transfer well with only a few labels across many semantic tasks. We argue that a single geometric quantity, \emph{directional} CDNV (decision-axis variance), sits at the core of two favorable behaviors: strong few-shot transfer within a task, and low interference across many tasks. We show that both emerge when variability \emph{along} class-separating directions is small. First, we prove sharp non-asymptotic multiclass generalization bounds for downstream classification whose leading term is the directional CDNV. The bounds include finite-shot corrections that cleanly separate intrinsic decision-axis variability from centroid-estimation error. Second, we link decision-axis collapse to multitask geometry: for independent balanced labelings, small directional CDNV across tasks forces the corresponding decision axes to be nearly orthogonal,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Face recognition and analysis