Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Adel ElZemity; Joshua Sylvester; Budi Arief; Rog\'erio De Lemos

arXiv:2602.10869·cs.CR·February 12, 2026

Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection

Adel ElZemity, Joshua Sylvester, Budi Arief, Rog\'erio De Lemos

PDF

Open Access

TL;DR

This paper introduces Agentic Knowledge Distillation, a method where a large language model autonomously generates data and refines a smaller model for SMS threat detection, significantly improving on previous approaches.

Contribution

It presents a novel autonomous training framework using LLMs as teachers to improve small language models for security tasks without human intervention.

Findings

01

Best teacher LLM achieved 94.31% accuracy and 96.25% recall.

02

Agentic Knowledge Distillation outperforms baseline methods significantly.

03

Performance depends heavily on the choice of the teacher LLM.

Abstract

SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes outdated. To deal with this issue, we present Agentic Knowledge Distillation, which consists of a powerful LLM acts as an autonomous teacher that fine-tunes a smaller student SLM, deployable for security tasks without human intervention. The teacher LLM autonomously generates synthetic data and iteratively refines a smaller on-device student model until performance plateaus. We compare four LLMs in this teacher role (Claude Opus 4.5, GPT 5.2 Codex, Gemini 3 Pro, and DeepSeek V3.2) on SMS spam/smishing detection with two student SLMs (Qwen2.5-0.5B and SmolLM2-135M). Our results show that performance varies substantially depending on the teacher LLM, with the best configuration achieving 94.31% accuracy and 96.25% recall. We also compare against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · User Authentication and Security Systems