OSPtrack: A Labeled Dataset Targeting Simulated Execution of Open-Source Software
Zhuoran Tan, Christos Anagnosstopoulos, Jeremy Singer

TL;DR
This paper introduces OSPtrack, a comprehensive labeled dataset capturing static and dynamic runtime features of open-source software across five ecosystems, aimed at improving malicious detection and supply chain security.
Contribution
The creation of a large, multi-ecosystem dataset with detailed runtime and static features, including labels for attack types, to support detection and analysis of malicious open-source software.
Findings
Dataset includes 9,461 package reports with 1,962 malicious instances.
Combines static and dynamic features like files, sockets, commands, DNS records.
Enables runtime detection and comparative analysis across ecosystems.
Abstract
Open-source software serves as a foundation for the internet and the cyber supply chain, but its exploitation is becoming increasingly prevalent. While advances in vulnerability detection for OSS have been significant, prior research has largely focused on static code analysis, often neglecting runtime indicators. To address this shortfall, we created a comprehensive dataset spanning five ecosystems, capturing features generated during the execution of packages and libraries in isolated environments. The dataset includes 9,461 package reports, of which 1,962 are identified as malicious, and encompasses both static and dynamic features such as files, sockets, commands, and DNS records. Each report is labeled with verified information and detailed sub-labels for attack types, facilitating the identification of malicious indicators when source code is unavailable. This dataset supports…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
