Self-Distillation of Hidden Layers for Self-Supervised Representation Learning

Scott C. Lowe; Anthony Fuller; Sageev Oore; Evan Shelhamer; Graham W. Taylor

arXiv:2603.15553·cs.CV·March 17, 2026

Self-Distillation of Hidden Layers for Self-Supervised Representation Learning

Scott C. Lowe, Anthony Fuller, Sageev Oore, Evan Shelhamer, Graham W. Taylor

PDF

Open Access

TL;DR

Bootleg is a novel self-distillation method that predicts hierarchical latent representations from multiple hidden layers, improving high-level feature learning and outperforming existing SSL methods on various vision benchmarks.

Contribution

Introduces Bootleg, a hierarchical self-distillation approach that enhances high-level feature learning in self-supervised vision models.

Findings

01

Bootleg outperforms I-JEPA by 10% on ImageNet-1K classification.

02

Bootleg achieves superior results on semantic segmentation benchmarks.

03

Hierarchical distillation improves feature abstraction at multiple levels.

Abstract

The landscape of self-supervised learning (SSL) is currently dominated by generative approaches (e.g., MAE) that reconstruct raw low-level data, and predictive approaches (e.g., I-JEPA) that predict high-level abstract embeddings. While generative methods provide strong grounding, they are computationally inefficient for high-redundancy modalities like imagery, and their training objective does not prioritize learning high-level, conceptual features. Conversely, predictive methods often suffer from training instability due to their reliance on the non-stationary targets of final-layer self-distillation. We introduce Bootleg, a method that bridges this divide by tasking the model with predicting latent representations from multiple hidden layers of a teacher network. This hierarchical objective forces the model to capture features at varying levels of abstraction simultaneously. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis