Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative   Models

Sanjay Vishwakarma; Francis Harkins; Siddharth Golecha; Vishal; Sharathchandra Bajpe; Nicolas Dupuis; Luca Buratti; David Kremer; Ismael; Faro; Ruchir Puri; Juan Cruz-Benito

arXiv:2406.14712·quant-ph·June 24, 2024

Qiskit HumanEval: An Evaluation Benchmark For Quantum Code Generative Models

Sanjay Vishwakarma, Francis Harkins, Siddharth Golecha, Vishal, Sharathchandra Bajpe, Nicolas Dupuis, Luca Buratti, David Kremer, Ismael, Faro, Ruchir Puri, Juan Cruz-Benito

PDF

Open Access 5 Models

TL;DR

This paper introduces the Qiskit HumanEval dataset to evaluate large language models' ability to generate executable quantum code, providing a new benchmark for quantum programming with AI.

Contribution

The paper presents a curated dataset of quantum tasks and systematically assesses LLMs' performance in generating quantum code using Qiskit, establishing a new evaluation benchmark.

Findings

01

LLMs can generate executable quantum code with varying accuracy.

02

The dataset enables systematic benchmarking of quantum code generation.

03

Results encourage further development of AI tools for quantum programming.

Abstract

Quantum programs are typically developed using quantum Software Development Kits (SDKs). The rapid advancement of quantum computing necessitates new tools to streamline this development process, and one such tool could be Generative Artificial intelligence (GenAI). In this study, we introduce and use the Qiskit HumanEval dataset, a hand-curated collection of tasks designed to benchmark the ability of Large Language Models (LLMs) to produce quantum code using Qiskit - a quantum SDK. This dataset consists of more than 100 quantum computing tasks, each accompanied by a prompt, a canonical solution, a comprehensive test case, and a difficulty scale to evaluate the correctness of the generated solutions. We systematically assess the performance of a set of LLMs against the Qiskit HumanEval dataset's tasks and focus on the models ability in producing executable quantum code. Our findings not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques

MethodsSparse Evolutionary Training · Focus