QSpark: Towards Reliable Qiskit Code Generation
Kiana Kheiri, Aamna Aamir, Andriy Miranskyy, Chen Ding

TL;DR
This paper introduces QSpark, a method to improve the reliability of Qiskit code generated by large language models through fine-tuning with reinforcement learning, significantly enhancing correctness on benchmark tasks.
Contribution
The paper presents a novel fine-tuning approach using RL methods (GRPO and ORPO) to improve LLM-generated quantum code, achieving state-of-the-art results on Qiskit HumanEval.
Findings
ORPO achieves 56.29% Pass@1 on Qiskit HumanEval, outperforming baselines.
GRPO reaches 49% Pass@1, also surpassing baselines.
Both methods show substantial improvements but do not solve all advanced tasks.
Abstract
Quantum circuits must be error-resilient, yet LLMs like Granite-20B-Code and StarCoder often output flawed Qiskit code. We fine-tuned the Qwen2.5-Coder-32B model with two RL methods, Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO), using a richly annotated synthetic dataset. On the Qiskit HumanEval benchmark, ORPO reaches 56.29% Pass@1 ( pp over Granite-8B-QK) and GRPO hits 49%, both beating all general-purpose baselines; on the original HumanEval they score 65.90% and 63.00%. GRPO performs well on basic tasks (44/78) and excels on intermediate ones (41/68), but neither GRPO nor ORPO solves any of the five advanced tasks, highlighting clear gains yet room for progress in AI-assisted quantum programming.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQR Code Applications and Technologies
