One Detector Fits All: Robust and Adaptive Detection of Malicious Packages from PyPI to Enterprises
Biagio Montaruli, Luca Compagna, Serena Elisa Ponta, Davide Balzarotti

TL;DR
This paper presents a robust, adaptable detector for malicious Python packages that leverages adversarial training and fine-grained obfuscation, effective in both public repositories and enterprise environments.
Contribution
It introduces a novel adversarial package generation method and demonstrates a detector adaptable to different false positive requirements with real-world case studies.
Findings
Robustness increased by 2.5x with adversarial training
Detected 10% more obfuscated malicious packages
Achieved low false positive rates in real-world deployments
Abstract
The rise of supply chain attacks via malicious Python packages demands robust detection solutions. Current approaches, however, overlook two critical challenges: robustness against adversarial source code transformations and adaptability to the varying false positive rate (FPR) requirements of different actors, from repository maintainers (requiring low FPR) to enterprise security teams (higher FPR tolerance). We introduce a robust detector capable of seamless integration into both public repositories like PyPI and enterprise ecosystems. To ensure robustness, we propose a novel methodology for generating adversarial packages using fine-grained code obfuscation. Combining these with adversarial training (AT) enhances detector robustness by 2.5x. We comprehensively evaluate AT effectiveness by testing our detector against 122,398 packages collected daily from PyPI over 80 days, showing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security
