QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback

Taku Mikuriya; Tatsuya Ishigaki; Masayuki Kawarada; Shunya Minami; Tadashi Kadowaki; Yohichi Suzuki; Soshun Naito; Shunya Takata; Takumi Kato; Tamotsu Basseda; Reo Yamada; Hiroya Takamura

arXiv:2510.26101·cs.CL·November 4, 2025

QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback

Taku Mikuriya, Tatsuya Ishigaki, Masayuki Kawarada, Shunya Minami, Tadashi Kadowaki, Yohichi Suzuki, Soshun Naito, Shunya Takata, Takumi Kato, Tamotsu Basseda, Reo Yamada, Hiroya Takamura

PDF

Open Access

TL;DR

QCoder Benchmark introduces a framework for evaluating language models on quantum programming tasks using simulated hardware feedback, highlighting the challenges and potential of LLMs in quantum code generation.

Contribution

It presents a novel benchmark for quantum programming that integrates hardware simulation feedback and human code comparison, facilitating advanced evaluation of LLMs in this domain.

Findings

01

GPT-4o achieves 18.97% accuracy on the benchmark.

02

Reasoning-based models reach up to 78% accuracy.

03

Human code submissions average 39.98% success rate.

Abstract

Large language models (LLMs) have increasingly been applied to automatic programming code generation. This task can be viewed as a language generation task that bridges natural language, human knowledge, and programming logic. However, it remains underexplored in domains that require interaction with hardware devices, such as quantum programming, where human coders write Python code that is executed on a quantum computer. To address this gap, we introduce QCoder Benchmark, an evaluation framework that assesses LLMs on quantum programming with feedback from simulated hardware devices. Our benchmark offers two key features. First, it supports evaluation using a quantum simulator environment beyond conventional Python execution, allowing feedback of domain-specific metrics such as circuit depth, execution time, and error classification, which can be used to guide better generation. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture · Machine Learning in Materials Science · Quantum many-body systems