Hybrid Knowledge Transfer through Attention and Logit Distillation for On-Device Vision Systems in Agricultural IoT
Stanley Mugisha, Rashid Kisitu, Florence Tushabe

TL;DR
This paper introduces a hybrid knowledge distillation method that transfers attention and logit knowledge from large Vision Transformers to lightweight models, enabling accurate, real-time plant disease detection on resource-limited IoT devices.
Contribution
It proposes a novel hybrid distillation framework with adaptive attention alignment and dual-loss optimization for efficient on-device vision in agriculture.
Findings
Distilled MobileNetV3 achieves 92.4% accuracy, close to Swin-L's 95.9%.
Reduces inference latency by over 80% on smartphones.
Improves accuracy by 3.5% over baseline MobileNetV3.
Abstract
Integrating deep learning applications into agricultural IoT systems faces a serious challenge of balancing the high accuracy of Vision Transformers (ViTs) with the efficiency demands of resource-constrained edge devices. Large transformer models like the Swin Transformers excel in plant disease classification by capturing global-local dependencies. However, their computational complexity (34.1 GFLOPs) limits applications and renders them impractical for real-time on-device inference. Lightweight models such as MobileNetV3 and TinyML would be suitable for on-device inference but lack the required spatial reasoning for fine-grained disease detection. To bridge this gap, we propose a hybrid knowledge distillation framework that synergistically transfers logit and attention knowledge from a Swin Transformer teacher to a MobileNetV3 student model. Our method includes the introduction of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Agriculture and AI
