MalGuard: Towards Real-Time, Accurate, and Actionable Detection of Malicious Packages in PyPI Ecosystem
Xingan Gao, Xiaobing Sun, Sicong Cao, Kaifeng Huang, Di Wu, Xiaolei Liu, Xingwei Lin, Yang Xiang

TL;DR
MalGuard is a novel detection system that combines graph centrality, LIME, and traditional ML to efficiently identify malicious packages in PyPI with high accuracy and explainability.
Contribution
The paper introduces MalGuard, a new approach that automates feature extraction and provides explainability for malicious package detection in PyPI.
Findings
MalGuard improves detection precision by up to 33.2%.
It successfully identified 113 new malicious packages in five weeks.
MalGuard's approach outperforms six state-of-the-art baselines.
Abstract
Malicious package detection has become a critical task in ensuring the security and stability of the PyPI. Existing detection approaches have focused on advancing model selection, evolving from traditional machine learning (ML) models to large language models (LLMs). However, as the complexity of the model increases, the time consumption also increases, which raises the question of whether a lightweight model achieves effective detection. Through empirical research, we demonstrate that collecting a sufficiently comprehensive feature set enables even traditional ML models to achieve outstanding performance. However, with the continuous emergence of new malicious packages, considerable human and material resources are required for feature analysis. Also, traditional ML model-based approaches lack of explainability to malicious packages.Therefore, we propose a novel approach MalGuard based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Digital and Cyber Forensics
