QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

Ali Slim; Haydar Hamieh; Jawad Kotaich; Yehya Ghosn; Mahdi Chehimi; Ammar Mohanna; Hasan Abed Al Kader Hammoud; Bernard Ghanem

arXiv:2604.08570·cs.LG·April 23, 2026

QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation

Ali Slim, Haydar Hamieh, Jawad Kotaich, Yehya Ghosn, Mahdi Chehimi, Ammar Mohanna, Hasan Abed Al Kader Hammoud, Bernard Ghanem

PDF

1 Repo

TL;DR

QuanBench+ introduces a comprehensive benchmark for evaluating large language models on quantum code generation across multiple frameworks, highlighting progress and ongoing challenges.

Contribution

It provides a unified multi-framework benchmark with aligned tasks and evaluates models with new metrics, including feedback-based repair, revealing framework-specific strengths and weaknesses.

Findings

01

Strongest one-shot scores: Qiskit 59.5%, Cirq 54.8%, PennyLane 42.9%.

02

Feedback-based repair improves scores significantly.

03

Quantum code generation remains dependent on framework-specific knowledge.

Abstract

Large Language Models (LLMs) are increasingly used for code generation, yet quantum code generation is still evaluated mostly within single frameworks, making it difficult to separate quantum reasoning from framework familiarity. We introduce QuanBench+, a unified benchmark spanning Qiskit, PennyLane, and Cirq, with 42 aligned tasks covering quantum algorithms, gate decomposition, and state preparation. We evaluate models with executable functional tests, report Pass@1 and Pass@5, and use KL-divergence-based acceptance for probabilistic outputs. We additionally study Pass@1 after feedback-based repair, where a model may revise code after a runtime error or wrong answer. Across frameworks, the strongest one-shot scores reach 59.5% in Qiskit, 54.8% in Cirq, and 42.9% in PennyLane; with feedback-based repair, the best scores rise to 83.3%, 76.2%, and 66.7%, respectively. These results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jawadkotaichh/quanbench-plus
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.