Object-level Self-Distillation for Vision Pretraining
\c{C}a\u{g}lar H{\i}zl{\i}, \c{C}a\u{g}atay Y{\i}ld{\i}z, Pekka Marttinen

TL;DR
This paper introduces Object-level Self-Distillation (ODIS), a novel vision pretraining method that focuses on individual objects within images, improving representation quality especially in complex, scene-rich datasets.
Contribution
ODIS shifts self-distillation from image-level to object-level, utilizing object-aware cropping and masked attention to enhance transformer-based visual representations.
Findings
Achieves 82.6% k-NN accuracy on ImageNet1k with ViT-Large.
Improves representations at both image and patch levels.
Transforms scene-level tasks into simpler object-level sub-tasks.
Abstract
State-of-the-art vision pretraining methods rely on image-level self-distillation from object-centric datasets such as ImageNet, implicitly assuming each image contains a single object. This assumption does not always hold: many ImageNet images already contain multiple objects. Further, it limits scalability to scene-centric datasets that better mirror real-world complexity. We address these challenges by introducing Object-level Self-DIStillation (ODIS), a pretraining approach that shifts the self-distillation granularity from whole images to individual objects. Using object-aware cropping and masked attention, ODIS isolates object-specific regions, guiding the transformer toward semantically meaningful content and transforming a noisy, scene-level task into simpler object-level sub-tasks. We show that this approach improves visual representations both at the image and patch levels.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
