GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities
Diganta Misra, Nizar Islah, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Antonio Orvieto, Muawiz Chaudhary, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia

TL;DR
GitChameleon 2.0 introduces a new dataset and benchmark for evaluating AI code generation models on Python code conditioned on specific library versions, emphasizing execution-based correctness and highlighting current system limitations.
Contribution
The paper presents GitChameleon 2.0, a curated dataset with executable tests for version-specific Python code, enabling more realistic evaluation of AI code generation models.
Findings
State-of-the-art models achieve 48-51% success rates.
Execution-based evaluation reveals significant challenges for current systems.
The benchmark guides future development of more reliable AI code generators.
Abstract
The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent version updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management
