A Benchmark Comparison of Python Malware Detection Approaches
Duc-Ly Vu, Zachary Newman, and John Speed Meyers

TL;DR
This paper evaluates existing Python malware detection tools within the PyPI ecosystem, highlighting their high false positive rates and the importance of collaboration between researchers and repository administrators for improved security.
Contribution
The study provides a benchmark dataset and comparative analysis of malware detection tools in the Python package repository context, revealing their limitations and proposing collaborative strategies.
Findings
Detection tools have false positive rates between 15% and 97%.
Increasing detection thresholds reduces false positives but also lowers true positive rates.
External researchers effectively identify malware, supplementing automated tools.
Abstract
While attackers often distribute malware to victims via open-source, community-driven package repositories, these repositories do not currently run automated malware detection systems. In this work, we explore the security goals of the repository administrators and the requirements for deployments of such malware scanners via a case study of the Python ecosystem and PyPI repository, which includes interviews with administrators and maintainers. Further, we evaluate existing malware detection techniques for deployment in this setting by creating a benchmark dataset and comparing several existing tools, including the malware checks implemented in PyPI, Bandit4Mal, and OSSGadget's OSS Detect Backdoor. We find that repository administrators have exacting technical demands for such malware detection tools. Specifically, they consider a false positive rate of even 0.01% to be unacceptably…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Network Security and Intrusion Detection
