QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges

Abdul Basit; Minghao Shao; Muhammad Haider Asif; Nouhaila Innan; Muhammad Kashif; Alberto Marchisio; Muhammad Shafique

arXiv:2506.20008·cs.AI·September 1, 2025

QHackBench: Benchmarking Large Language Models for Quantum Code Generation Using PennyLane Hackathon Challenges

Abdul Basit, Minghao Shao, Muhammad Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique

PDF

Open Access

TL;DR

This paper introduces QHackBench, a benchmark dataset for evaluating large language models in quantum code generation using PennyLane, and proposes methods to improve model performance on quantum challenges.

Contribution

It presents QHackBench, a novel benchmark dataset, and evaluates LLMs with new prompting and multi-agent refinement techniques for quantum programming.

Findings

01

RAG models perform comparably to vanilla prompting in complex tasks.

02

Multi-agent refinement improves execution success rates.

03

Public release of QHackBench facilitates future research.

Abstract

Recent advances in Large Language Models (LLMs) have demonstrated strong potential in code generation, yet their effectiveness in quantum computing remains underexplored. This paper benchmarks LLMs for PennyLane-based quantum code generation using real-world challenges from the Quantum Hackathon (QHack). We introduce QHackBench, a novel benchmark dataset derived from QHack competitions, and evaluate model performance under vanilla prompting and Retrieval-Augmented Generation (RAG). Our structured evaluation framework assesses functional correctness, syntactic validity, and execution success across varying challenge difficulties. Results indicate that RAG-enhanced models, supplemented with an augmented PennyLane dataset, approximately generate similar results as the standard prompting, particularly in complex quantum algorithms. Additionally, we introduce a multi-agent evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture · Misinformation and Its Impacts · Quantum many-body systems