A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

Tarique Dahri; Zulfiqar Ali Memon; Zhenyu Yu; Mohd. Yamani Idna Idris; Sheheryar Khan; Sadiq Ahmad; Maged Shoman; Saddam Aziz; Rizwan Qureshi

arXiv:2506.07055·cs.CV·June 10, 2025

A Layered Self-Supervised Knowledge Distillation Framework for Efficient Multimodal Learning on the Edge

Tarique Dahri, Zulfiqar Ali Memon, Zhenyu Yu, Mohd. Yamani Idna Idris, Sheheryar Khan, Sadiq Ahmad, Maged Shoman, Saddam Aziz, Rizwan Qureshi

PDF

Open Access

TL;DR

The paper proposes a novel layered self-supervised knowledge distillation framework that improves the performance of compact models for multimodal learning, achieving state-of-the-art results without increasing inference costs.

Contribution

It introduces a layered approach with auxiliary classifiers for self-supervised knowledge transfer, enhancing model performance without extra inference overhead.

Findings

01

Achieves 4.54% improvement over PS-KD on CIFAR-100

02

Attains 1.14% gain over SSKD on CIFAR-100

03

Provides state-of-the-art results in few-shot learning scenarios

Abstract

We introduce Layered Self-Supervised Knowledge Distillation (LSSKD) framework for training compact deep learning models. Unlike traditional methods that rely on pre-trained teacher networks, our approach appends auxiliary classifiers to intermediate feature maps, generating diverse self-supervised knowledge and enabling one-to-one transfer across different network stages. Our method achieves an average improvement of 4.54\% over the state-of-the-art PS-KD method and a 1.14% gain over SSKD on CIFAR-100, with a 0.32% improvement on ImageNet compared to HASSKD. Experiments on Tiny ImageNet and CIFAR-100 under few-shot learning scenarios also achieve state-of-the-art results. These findings demonstrate the effectiveness of our approach in enhancing model generalization and performance without the need for large over-parameterized teacher networks. Importantly, at the inference stage, all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications