# Knowledge Distillation Meets Reinforcement Learning: A Cluster-Driven Approach to Image Processing

**Authors:** Titinunt Kitrungrotsakul, Yingying Xu, Preeyanuch Srichola

PMC · DOI: 10.3390/s26010209 · Sensors (Basel, Switzerland) · 2025-12-28

## TL;DR

This paper introduces a new framework combining knowledge distillation and reinforcement learning to improve lightweight models for complex image tasks like remote sensing and medical imaging.

## Contribution

The novel KDRL framework integrates knowledge distillation with reinforcement learning using cluster alignment and auxiliary layers for robust feature learning.

## Key findings

- KDRL achieves 69.51% zero-shot accuracy on AID and 80.08% on RESISC45 with a lightweight student model.
- It sets new benchmarks for cross-modal retrieval on RSITMD with 67.44% (I→T) and 74.76% (T→I) at R@10.
- The framework improves DIOR-RSVG visual-grounding precision to 64.21% at Pr@0.9.

## Abstract

Knowledge distillation (KD) enables the training of lightweight yet effective models, particularly in the visual domain. Meanwhile, reinforcement learning (RL) facilitates adaptive learning through environment-driven interactions, addressing the limitations of KD in handling dynamic and complex tasks. We propose a novel two-stage framework integrating Knowledge Distillation with Reinforcement Learning (KDRL) to enhance model adaptability to complex data distributions, such as remote sensing and medical imaging. In the first stage, supervised fine-tuning guides the student model using logit and feature-based distillation. The second stage refines the model via RL, leveraging confidence-based and cluster alignment rewards while dynamically reducing reliance on task loss. By combining the strengths of supervised knowledge distillation and reinforcement learning, KDRL provides a comprehensive approach to address the dual challenges of model efficiency and domain heterogeneity. A key innovation is the introduction of auxiliary layers within the student encoder to evaluate and reward the alignment of the characteristics with the teacher’s cluster centers, promoting robust feature learning. Our framework demonstrates superior performance and computational efficiency across diverse tasks, establishing a scalable design for efficient model training. Across remote sensing benchmarks, KDRL boosts the lightweight CLIP/ViT-B-32 student to 69.51% zero-shot accuracy on AID and 80.08% on RESISC45; achieves state-of-the-art cross-modal retrieval on RSITMD with 67.44% (I→T) and 74.76% (T→I) at R@10; and improves DIOR-RSVG visual-grounding precision to 64.21% at Pr@0.9. These gains matter in real deployments by reducing missed targets and speeding analyst search on resource-constrained platforms.

## Full-text entities

- **Diseases:** KDRL (MESH:D007859), injury to (MESH:D014947)
- **Chemicals:** S (MESH:D013455), KD (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12788330/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12788330/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/PMC12788330/full.md

---
Source: https://tomesphere.com/paper/PMC12788330