# ATMS-KD: Adaptive Temperature and Mixed Sample Knowledge Distillation for a Lightweight Residual CNN in Agricultural Embedded Systems

**Authors:** Mohamed Ohamouddou, Said Ohamouddou, Abdellatif El Afia, Rafik Lasri

arXiv: 2508.20232 · 2025-08-29

## TL;DR

This paper introduces ATMS-KD, a knowledge distillation framework that enhances lightweight CNN models for agricultural image classification by combining adaptive temperature scheduling with mixed-sample augmentation, achieving high accuracy and low latency.

## Contribution

The study presents a novel ATMS-KD framework that effectively transfers knowledge from a large teacher model to lightweight CNNs in agricultural settings, outperforming existing methods.

## Key findings

- All student models achieved over 96.7% accuracy with ATMS-KD.
- The compact model reached 97.11% accuracy, outperforming other methods.
- Knowledge retention exceeded 99% across configurations.

## Abstract

This study proposes ATMS-KD (Adaptive Temperature and Mixed-Sample Knowledge Distillation), a novel framework for developing lightweight CNN models suitable for resource-constrained agricultural environments. The framework combines adaptive temperature scheduling with mixed-sample augmentation to transfer knowledge from a MobileNetV3 Large teacher model (5.7\,M parameters) to lightweight residual CNN students. Three student configurations were evaluated: Compact (1.3\,M parameters), Standard (2.4\,M parameters), and Enhanced (3.8\,M parameters). The dataset used in this study consists of images of \textit{Rosa damascena} (Damask rose) collected from agricultural fields in the Dades Oasis, southeastern Morocco, providing a realistic benchmark for agricultural computer vision applications under diverse environmental conditions. Experimental evaluation on the Damascena rose maturity classification dataset demonstrated significant improvements over direct training methods. All student models achieved validation accuracies exceeding 96.7\% with ATMS-KD compared to 95--96\% with direct training. The framework outperformed eleven established knowledge distillation methods, achieving 97.11\% accuracy with the compact model -- a 1.60 percentage point improvement over the second-best approach while maintaining the lowest inference latency of 72.19\,ms. Knowledge retention rates exceeded 99\% for all configurations, demonstrating effective knowledge transfer regardless of student model capacity.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20232/full.md

## Figures

22 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20232/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/2508.20232/full.md

---
Source: https://tomesphere.com/paper/2508.20232