SkillClone: Multi-Modal Clone Detection and Clone Propagation Analysis in the Agent Skill Ecosystem
Jiaying Zhu, Lyuye Zhang, Wenbo Guo, Yang Liu

TL;DR
SkillClone is a novel multi-modal clone detection system for agent skills that effectively identifies clone relationships across YAML, natural language, and code, addressing systemic risks in the skill ecosystem.
Contribution
It introduces the first multi-modal clone detection method for agent skills, combining TF-IDF and per-channel analysis, and provides a comprehensive benchmark dataset.
Findings
Achieves 0.939 F1 score on SkillClone-Bench, outperforming baseline methods.
Detects 258K clone pairs among 20K skills, involving 75% of skills.
Reveals the ecosystem is inflated 3.5x due to duplicated and superseded skills.
Abstract
Agent skills are modular instruction packages that combine YAML metadata, natural language instructions, and embedded code, and they have reached 196K publicly available instances, yet no mechanism exists to detect clone relationships among them. This gap creates systemic risks: a vulnerability in a widely copied skill silently persists across derivatives with no alert to maintainers. Existing clone detectors, designed for single-modality source code, cannot handle the multi-modal structure of skills, where clone evidence is distributed across three interleaved content channels. We present SkillClone, the first multi-modal clone detection approach for agent skills. SkillClone fuses flat TF-IDF similarity with per-channel decomposition (YAML, NL, code) through logistic regression, combining strong detection with interpretable type classification. We construct SkillClone-Bench, a balanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
