We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
Joseph Spracklen, Raveen Wijewickrama, A H M Nazmus Sakib, Anindya, Maiti, Bimal Viswanath, Murtuza Jadliwala

TL;DR
This paper investigates the prevalence and causes of package hallucinations in code-generating LLMs, revealing high rates of erroneous package suggestions and proposing mitigation strategies to improve code generation reliability.
Contribution
It provides a comprehensive evaluation of package hallucinations across multiple models and languages, quantifies their severity, and introduces mitigation techniques to reduce these errors.
Findings
At least 5.2% hallucinated packages in commercial models
Open-source models exhibit 21.7% hallucinations
Over 200,000 unique hallucinated package names identified
Abstract
The reliance of popular programming languages such as Python and JavaScript on centralized package repositories and open-source software, combined with the emergence of code-generating Large Language Models (LLMs), has created a new type of threat to the software supply chain: package hallucinations. These hallucinations, which arise from fact-conflicting errors when generating code using LLMs, represent a novel form of package confusion attack that poses a critical threat to the integrity of the software supply chain. This paper conducts a rigorous and comprehensive evaluation of package hallucinations across different programming languages, settings, and parameters, exploring how a diverse set of models and configurations affect the likelihood of generating erroneous package recommendations and identifying the root causes of this phenomenon. Using 16 popular LLMs for code generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeurological disorders and treatments · Plant-based Medicinal Research · Physical Unclonable Functions (PUFs) and Hardware Security
MethodsSparse Evolutionary Training
