Can LLMs Solve Science or Just Write Code? Evaluating Quantum Solver Generation
Luciano Baresi, Domenico Bianculli, Maryse Ernzer, Livia Lestingi, Fabrizio Pastore, Seung Yeob Shin

TL;DR
This paper introduces Q-SAGE, an iterative evaluation methodology for assessing LLMs' ability to generate accurate quantum solvers, revealing improvements with refinement but also current limitations.
Contribution
The paper presents Q-SAGE, a novel iterative approach to evaluate and improve LLM-generated quantum solvers for scientific problems.
Findings
Iterative refinement significantly increases success rates.
Failure modes shift from execution errors to numerical inaccuracies with better models.
Refinement introduces substantial computational overhead.
Abstract
Large Language Models (LLMs) show strong capabilities in code generation, motivating their use in automated quantum solver development. However, in quantum computing, successful execution of generated code is not sufficient: correctness depends on numerically accurate results, which are sensitive to non-trivial mappings, hybrid quantum-classical workflows, and algorithm-specific approximations. This work introduces Q-SAGE, an iterative methodology to evaluate LLMs' capability in generating quantum solvers for scientific problems. The methodology adopts an iterative approach by executing the script generated by the LLM, comparing the result with the result of a classical solver, and refining the script until the two results match within a tolerance threshold. We empirically evaluated the methodology with five families of scientific problems of different complexities and five LLMs, both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
