Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

Chengjie Wang; Jingzheng Wu; Xiang Ling; Tianyue Luo; Chen Zhao

arXiv:2605.06279·cs.SE·May 8, 2026

Correct Code, Vulnerable Dependencies: A Large Scale Measurement Study of LLM-Specified Library Versions

Chengjie Wang, Jingzheng Wu, Xiang Ling, Tianyue Luo, Chen Zhao

PDF

1 Repo

TL;DR

This study systematically analyzes the security and compatibility risks of third-party library versions specified by large language models in Python code, revealing systemic biases and vulnerabilities.

Contribution

It provides the first large-scale measurement of version-level risks in LLM-generated code, highlighting systemic biases and proposing mitigation strategies.

Findings

01

Over 36% of tasks contain known CVEs with high severity.

02

Models tend to select risky library versions, often before CVEs are publicly disclosed.

03

Externally constrained version specifications reduce vulnerabilities and failures.

Abstract

Large language models (LLMs) are now largely involved in software development workflows, and the code they generate routinely includes third-party library (TPL) imports annotated with specific version identifiers. These version choices can carry security and compatibility risks, yet they have not been systematically studied. We present the first large-scale measurement study of version-level risk in LLM-generated Python code, evaluating 10 LLMs on PinTrace, a curated benchmark of 1,000 Stack Overflow programming tasks. LLMs tend to specify version identifiers when directly prompted at 26.83%-95.18%, while down to 6.45%-59.19% in creating a manifest file directly. Among the specified versions, 36.70%-55.70% of tasks contain at least one known CVE, and 62.75%-74.51% of them carry Critical or High severity ratings. In 72.27%-91.37% of cases, the associated CVEs were publicly disclosed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dw763j/PinTrace
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.