Agentic Knowledge Distillation: Autonomous Training of Small Language Models for SMS Threat Detection
Adel ElZemity, Joshua Sylvester, Budi Arief, Rog\'erio De Lemos

TL;DR
This paper introduces Agentic Knowledge Distillation, a method where a large language model autonomously generates data and refines a smaller model for SMS threat detection, significantly improving on previous approaches.
Contribution
It presents a novel autonomous training framework using LLMs as teachers to improve small language models for security tasks without human intervention.
Findings
Best teacher LLM achieved 94.31% accuracy and 96.25% recall.
Agentic Knowledge Distillation outperforms baseline methods significantly.
Performance depends heavily on the choice of the teacher LLM.
Abstract
SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes outdated. To deal with this issue, we present Agentic Knowledge Distillation, which consists of a powerful LLM acts as an autonomous teacher that fine-tunes a smaller student SLM, deployable for security tasks without human intervention. The teacher LLM autonomously generates synthetic data and iteratively refines a smaller on-device student model until performance plateaus. We compare four LLMs in this teacher role (Claude Opus 4.5, GPT 5.2 Codex, Gemini 3 Pro, and DeepSeek V3.2) on SMS spam/smishing detection with two student SLMs (Qwen2.5-0.5B and SmolLM2-135M). Our results show that performance varies substantially depending on the teacher LLM, with the best configuration achieving 94.31% accuracy and 96.25% recall. We also compare against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Advanced Malware Detection Techniques · User Authentication and Security Systems
