"Elementary, My Dear Watson." Detecting Malicious Skills via Neuro-Symbolic Reasoning across Heterogeneous Artifacts

Shenao Wang; Junjie He; Yanjie Zhao; Yayi Wang; Kan Yu; Haoyu Wang

arXiv:2603.27204·cs.CR·March 31, 2026

"Elementary, My Dear Watson." Detecting Malicious Skills via Neuro-Symbolic Reasoning across Heterogeneous Artifacts

Shenao Wang, Junjie He, Yanjie Zhao, Yayi Wang, Kan Yu, Haoyu Wang

PDF

TL;DR

MalSkills is a neuro-symbolic framework designed to detect malicious skills in large language model agent ecosystems by analyzing heterogeneous artifacts and reasoning about suspicious workflows.

Contribution

The paper introduces MalSkills, a novel neuro-symbolic approach that combines symbolic parsing and LLM-assisted analysis to detect malicious skills across diverse artifacts.

Findings

01

MalSkills achieves 93% F1 score on a real-world skills benchmark.

02

It outperforms existing methods by 5-87 percentage points.

03

Discovered 76 previously unknown malicious skills in public registries.

Abstract

Skills are increasingly used to extend LLM agents by packaging prompts, code, and configurations into reusable modules. As public registries and marketplaces expand, they form an emerging agentic supply chain, but also introduce a new attack surface for malicious skills. Detecting malicious skills is challenging because relevant evidence is often distributed across heterogeneous artifacts and must be reasoned in context. Existing static, LLM-based, and dynamic approaches each capture only part of this problem, making them insufficient for robust real-world detection. In this paper, we present MalSkills, a neuro-symbolic framework for malicious skills detection. MalSkills first extracts security-sensitive operations from heterogeneous artifacts through a combination of symbolic parsing and LLM-assisted semantic analysis. It then constructs the skill dependency graph that links artifacts,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.