TinySSL: Distilled Self-Supervised Pretraining for Sub-Megabyte MCU Models
Bibin Wilson

TL;DR
This paper introduces CA-DSSL, a novel self-supervised learning framework tailored for tiny microcontroller models, achieving significant accuracy improvements without additional inference costs.
Contribution
It proposes a capacity-aware distillation approach that overcomes key obstacles in SSL for sub-megabyte models, outperforming existing methods on CIFAR-100 and Pascal VOC.
Findings
CA-DSSL achieves 62.7% linear accuracy on CIFAR-100 with 396K parameters.
It surpasses SimCLR-Tiny by 18 percentage points and matches SEED with fewer parameters.
On Pascal VOC, CA-DSSL improves detection mAP over SEED.
Abstract
Self-supervised learning (SSL) has transformed representation learning for large models, yet remains unexplored for microcontroller (MCU)-class models with fewer than 500K parameters. We identify three obstacles at this scale -- projection head dominance, representation bottleneck, and augmentation sensitivity -- and propose Capacity-Aware Distilled Self-Supervised Learning (CA-DSSL), a teacher-guided framework that overcomes them without labels or text supervision. CA-DSSL combines asymmetric distillation from a frozen DINO ViT-S/16 teacher, multi-scale feature distillation for spatial representations, and a progressive augmentation curriculum. On a MobileNetV2-0.35 backbone (396K parameters) pretrained on CIFAR-100, CA-DSSL reaches 62.7 0.5% linear-probe accuracy (3-seed mean) -- surpassing SimCLR-Tiny by 18 pp, matching SEED (61.7%) with 10 fewer projection parameters (426K vs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
