"Elementary, My Dear Watson." Detecting Malicious Skills via Neuro-Symbolic Reasoning across Heterogeneous Artifacts
Shenao Wang, Junjie He, Yanjie Zhao, Yayi Wang, Kan Yu, Haoyu Wang

TL;DR
MalSkills is a neuro-symbolic framework designed to detect malicious skills in large language model agent ecosystems by analyzing heterogeneous artifacts and reasoning about suspicious workflows.
Contribution
The paper introduces MalSkills, a novel neuro-symbolic approach that combines symbolic parsing and LLM-assisted analysis to detect malicious skills across diverse artifacts.
Findings
MalSkills achieves 93% F1 score on a real-world skills benchmark.
It outperforms existing methods by 5-87 percentage points.
Discovered 76 previously unknown malicious skills in public registries.
Abstract
Skills are increasingly used to extend LLM agents by packaging prompts, code, and configurations into reusable modules. As public registries and marketplaces expand, they form an emerging agentic supply chain, but also introduce a new attack surface for malicious skills. Detecting malicious skills is challenging because relevant evidence is often distributed across heterogeneous artifacts and must be reasoned in context. Existing static, LLM-based, and dynamic approaches each capture only part of this problem, making them insufficient for robust real-world detection. In this paper, we present MalSkills, a neuro-symbolic framework for malicious skills detection. MalSkills first extracts security-sensitive operations from heterogeneous artifacts through a combination of symbolic parsing and LLM-assisted semantic analysis. It then constructs the skill dependency graph that links artifacts,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
