TL;DR
This paper introduces Amalfi, a machine learning-based system for automatically detecting malicious npm packages by combining classifiers, source code verification, and clone detection, effectively identifying new malware with minimal false positives.
Contribution
Amalfi is a lightweight, multi-technique approach that improves malware detection accuracy in npm packages compared to existing methods.
Findings
Identified 95 previously unknown malicious packages in one week.
Amalfi requires only a few seconds per package for analysis.
Achieved high detection accuracy with manageable false positives.
Abstract
The npm registry is one of the pillars of the JavaScript and TypeScript ecosystems, hosting over 1.7 million packages ranging from simple utility libraries to complex frameworks and entire applications. Due to the overwhelming popularity of npm, it has become a prime target for malicious actors, who publish new packages or compromise existing packages to introduce malware that tampers with or exfiltrates sensitive data from users who install either these packages or any package that (transitively) depends on them. Defending against such attacks is essential to maintaining the integrity of the software supply chain, but the sheer volume of package updates makes comprehensive manual review infeasible. We present Amalfi, a machine-learning based approach for automatically detecting potentially malicious packages comprised of three complementary techniques. We start with classifiers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
