Many Hands Make Light Work: An LLM-based Multi-Agent System for Detecting Malicious PyPI Packages

Muhammad Umar Zeshan; Motunrayo Ibiyo; Claudio Di Sipio; Phuong T. Nguyen; Davide Di Ruscio

arXiv:2601.12148·cs.SE·January 27, 2026

Many Hands Make Light Work: An LLM-based Multi-Agent System for Detecting Malicious PyPI Packages

Muhammad Umar Zeshan, Motunrayo Ibiyo, Claudio Di Sipio, Phuong T. Nguyen, Davide Di Ruscio

PDF

Open Access

TL;DR

This paper introduces LAMPS, a multi-agent system leveraging collaborative large language models to effectively detect malicious packages in open-source repositories, demonstrating high accuracy and modular reasoning capabilities.

Contribution

The paper presents a novel multi-agent framework using LLMs for malicious code detection, combining role-specific agents and a modular architecture for improved interpretability and performance.

Findings

01

LAMPS achieves 97.7% accuracy on balanced dataset D1.

02

LAMPS reaches 99.5% accuracy on realistic dataset D2.

03

Distributed LLM reasoning enhances malicious code detection effectiveness.

Abstract

Malicious code in open-source repositories such as PyPI poses a growing threat to software supply chains. Traditional rule-based tools often overlook the semantic patterns in source code that are crucial for identifying adversarial components. Large language models (LLMs) show promise for software analysis, yet their use in interpretable and modular security pipelines remains limited. This paper presents LAMPS, a multi-agent system that employs collaborative LLMs to detect malicious PyPI packages. The system consists of four role-specific agents for package retrieval, file extraction, classification, and verdict aggregation, coordinated through the CrewAI framework. A prototype combines a fine-tuned CodeBERT model for classification with LLaMA-3 agents for contextual reasoning. LAMPS has been evaluated on two complementary datasets: D1, a balanced collection of 6,000 setup.py files, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Spam and Phishing Detection