Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models
Sanjay Vishwakarma, Francis Harkins, Siddharth Golecha, Vishal, Sharathchandra Bajpe, Nicolas Dupuis, Luca Buratti, David Kremer, Ismael, Faro, Ruchir Puri, Juan Cruz-Benito

TL;DR
This paper introduces the Qiskit HumanEval dataset to evaluate large language models' ability to generate executable quantum code, providing a new benchmark for quantum programming with AI.
Contribution
The paper presents a curated dataset of quantum tasks and systematically assesses LLMs' performance in generating quantum code using Qiskit, establishing a new evaluation benchmark.
Findings
LLMs can generate executable quantum code with varying accuracy.
The dataset enables systematic benchmarking of quantum code generation.
Results encourage further development of AI tools for quantum programming.
Abstract
Quantum programs are typically developed using quantum Software Development Kits (SDKs). The rapid advancement of quantum computing necessitates new tools to streamline this development process, and one such tool could be Generative Artificial intelligence (GenAI). In this study, we introduce and use the Qiskit HumanEval dataset, a hand-curated collection of tasks designed to benchmark the ability of Large Language Models (LLMs) to produce quantum code using Qiskit - a quantum SDK. This dataset consists of more than 100 quantum computing tasks, each accompanied by a prompt, a canonical solution, a comprehensive test case, and a difficulty scale to evaluate the correctness of the generated solutions. We systematically assess the performance of a set of LLMs against the Qiskit HumanEval dataset's tasks and focus on the models ability in producing executable quantum code. Our findings not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
MethodsSparse Evolutionary Training · Focus
