Self-Distillation of Hidden Layers for Self-Supervised Representation Learning
Scott C. Lowe, Anthony Fuller, Sageev Oore, Evan Shelhamer, Graham W. Taylor

TL;DR
Bootleg is a novel self-distillation method that predicts hierarchical latent representations from multiple hidden layers, improving high-level feature learning and outperforming existing SSL methods on various vision benchmarks.
Contribution
Introduces Bootleg, a hierarchical self-distillation approach that enhances high-level feature learning in self-supervised vision models.
Findings
Bootleg outperforms I-JEPA by 10% on ImageNet-1K classification.
Bootleg achieves superior results on semantic segmentation benchmarks.
Hierarchical distillation improves feature abstraction at multiple levels.
Abstract
The landscape of self-supervised learning (SSL) is currently dominated by generative approaches (e.g., MAE) that reconstruct raw low-level data, and predictive approaches (e.g., I-JEPA) that predict high-level abstract embeddings. While generative methods provide strong grounding, they are computationally inefficient for high-redundancy modalities like imagery, and their training objective does not prioritize learning high-level, conceptual features. Conversely, predictive methods often suffer from training instability due to their reliance on the non-stationary targets of final-layer self-distillation. We introduce Bootleg, a method that bridges this divide by tasking the model with predicting latent representations from multiple hidden layers of a teacher network. This hierarchical objective forces the model to capture features at varying levels of abstraction simultaneously. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
