A Survey on Common Threats in npm and PyPi Registries
Berkay Kaplan, Jingyu Qian

TL;DR
This survey reviews common security threats in npm and PyPi open source package registries, highlighting attack types, risks, and proposing machine learning-based countermeasures as novel solutions.
Contribution
It provides the first comprehensive survey of threats in npm and PyPi and introduces ML-driven countermeasures to detect malicious activities.
Findings
Identifies attack types like typosquatting and dependency-based vulnerabilities.
Highlights the prevalence of outdated packages and technical lag.
Proposes ML techniques as potential detection tools.
Abstract
Software engineers regularly use JavaScript and Python for both front-end and back-end automation tasks. On top of JavaScript and Python, there are several frameworks to facilitate automation tasks further. Some of these frameworks are Node Manager Package (npm) and Python Package Index (PyPi), which are open source (OS) package libraries. The public registries npm and PyPi use to host packages allow any user with a verified email to publish code. The lack of a comprehensive scanning tool when publishing to the registry creates security concerns. Users can report malicious code on the registry; however, attackers can still cause damage until they remove their tool from the platform. Furthermore, several packages depend on each other, making them more vulnerable to a bad package in the dependency tree. The heavy code reuse creates security artifacts developers have to consider, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Software Testing and Debugging Techniques
