QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback
Taku Mikuriya, Tatsuya Ishigaki, Masayuki Kawarada, Shunya Minami, Tadashi Kadowaki, Yohichi Suzuki, Soshun Naito, Shunya Takata, Takumi Kato, Tamotsu Basseda, Reo Yamada, Hiroya Takamura

TL;DR
QCoder Benchmark introduces a framework for evaluating language models on quantum programming tasks using simulated hardware feedback, highlighting the challenges and potential of LLMs in quantum code generation.
Contribution
It presents a novel benchmark for quantum programming that integrates hardware simulation feedback and human code comparison, facilitating advanced evaluation of LLMs in this domain.
Findings
GPT-4o achieves 18.97% accuracy on the benchmark.
Reasoning-based models reach up to 78% accuracy.
Human code submissions average 39.98% success rate.
Abstract
Large language models (LLMs) have increasingly been applied to automatic programming code generation. This task can be viewed as a language generation task that bridges natural language, human knowledge, and programming logic. However, it remains underexplored in domains that require interaction with hardware devices, such as quantum programming, where human coders write Python code that is executed on a quantum computer. To address this gap, we introduce QCoder Benchmark, an evaluation framework that assesses LLMs on quantum programming with feedback from simulated hardware devices. Our benchmark offers two key features. First, it supports evaluation using a quantum simulator environment beyond conventional Python execution, allowing feedback of domain-specific metrics such as circuit depth, execution time, and error classification, which can be used to guide better generation. Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Machine Learning in Materials Science · Quantum many-body systems
