QUT-DV25: A Dataset for Dynamic Analysis of Next-Gen Software Supply Chain Attacks
Sk Tanzir Mehedi, Raja Jurdak, Chadni Islam, Gowri Ramachandran

TL;DR
QUT-DV25 is a dynamic dataset capturing real-time behaviors of Python packages, including malicious ones, to improve detection of sophisticated supply chain attacks in software ecosystems.
Contribution
The paper introduces QUT-DV25, a novel dynamic analysis dataset that captures install and post-install behaviors of Python packages, including malicious ones, to enhance supply chain attack detection.
Findings
Identified four malicious packages with covert remote access and multi-phase payloads.
Demonstrated the dataset's effectiveness in detecting previously labeled benign packages.
Outperformed static and metadata-based datasets in threat detection accuracy.
Abstract
Securing software supply chains is a growing challenge due to the inadequacy of existing datasets in capturing the complexity of next-gen attacks, such as multiphase malware execution, remote access activation, and dynamic payload generation. Existing datasets, which rely on metadata inspection and static code analysis, are inadequate for detecting such attacks. This creates a critical gap because these datasets do not capture what happens during and after a package is installed. To address this gap, we present QUT-DV25, a dynamic analysis dataset specifically designed to support and advance research on detecting and mitigating supply chain attacks within the Python Package Index (PyPI) ecosystem. This dataset captures install and post-install-time traces from 14,271 Python packages, of which 7,127 are malicious. The packages are executed in an isolated sandbox environment using an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
