Importing Phantoms: Measuring LLM Package Hallucination Vulnerabilities
Arjun Krishna, Erick Galinkin, Leon Derczynski, Jeffrey Martin

TL;DR
This paper investigates how large language models hallucinate package dependencies, analyzing vulnerabilities and proposing strategies to mitigate supply chain attacks in AI-assisted software development.
Contribution
It provides a comprehensive analysis of package hallucination behaviors across models and languages, highlighting factors influencing hallucination rates and suggesting defensive strategies.
Findings
Package hallucination rate varies with model, language, size, and task specificity.
A Pareto boundary exists between code performance and hallucination, indicating lack of optimization for security.
Inverse correlation between hallucination rate and HumanEval score as a heuristic for hallucination propensity.
Abstract
Large Language Models (LLMs) have become an essential tool in the programmer's toolkit, but their tendency to hallucinate code can be used by malicious actors to introduce vulnerabilities to broad swathes of the software supply chain. In this work, we analyze package hallucination behaviour in LLMs across popular programming languages examining both existing package references and fictional dependencies. By analyzing this package hallucination behaviour we find potential attacks and suggest defensive strategies to defend against these attacks. We discover that package hallucination rate is predicated not only on model choice, but also programming language, model size, and specificity of the coding task request. The Pareto optimality boundary between code generation performance and package hallucination is sparsely populated, suggesting that coding models are not being optimized for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntegrated Circuits and Semiconductor Failure Analysis · Physical Unclonable Functions (PUFs) and Hardware Security · Electrostatic Discharge in Electronics
MethodsBalanced Selection
