An Analysis of Malicious Packages in Open-Source Software in the Wild
Xiaoyan Zhou, Ying Zhang, Wenjia Niu, Jiqiang Liu, Haining, Wang, Qiang Li

TL;DR
This paper constructs a large dataset of malicious open-source packages, analyzes malware diversity and reuse, and highlights the importance of diverse data sources and security reports for understanding OSS malware threats.
Contribution
It introduces the largest malware dataset for OSS, proposes a knowledge graph for malware analysis, and provides insights into malware reuse, dependency hiding, and data source importance.
Findings
Low malware diversity due to code reuse
Dependency-hidden malware has shorter active periods
Security reports are crucial for malware context understanding
Abstract
The open-source software (OSS) ecosystem suffers from security threats caused by malware.However, OSS malware research has three limitations: a lack of high-quality datasets, a lack of malware diversity, and a lack of attack campaign contexts. In this paper, we first build the largest dataset of 24,356 malicious packages from online sources, then propose a knowledge graph to represent the OSS malware corpus and conduct malware analysis in the wild.Our main findings include (1) it is essential to collect malicious packages from various online sources because their data overlapping degrees are small;(2) despite the sheer volume of malicious packages, many reuse similar code, leading to a low diversity of malware;(3) only 28 malicious packages were repeatedly hidden via dependency libraries of 1,354 malicious packages, and dependency-hidden malware has a shorter active time;(4) security…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Digital and Cyber Forensics · Network Security and Intrusion Detection
