Accelerating the drive towards energy-efficient generative AI with quantum computing algorithms
Frederik F. Fl\"other, Jan Mikolon, Maria Longobardi

TL;DR
This paper explores how quantum computing algorithms could improve energy efficiency in the development and deployment of large language models, addressing sustainability concerns in AI.
Contribution
It provides a detailed analysis of potential quantum algorithm applications across the lifecycle of large language models to enhance energy efficiency and sustainability.
Findings
Quantum algorithms may reduce energy consumption in AI training and inference.
Identification of open research problems in quantum-enhanced AI.
Discussion of industry applications for energy-efficient AI.
Abstract
Research and usage of artificial intelligence, particularly generative and large language models, have rapidly progressed over the last years. This has, however, given rise to issues due to high energy consumption. While quantum computing is not (yet) mainstream, its intersection with machine learning is especially promising, and the technology could alleviate some of these energy challenges. In this perspective article, we break down the lifecycle stages of large language models and discuss relevant enhancements based on quantum algorithms that may aid energy efficiency and sustainability, including industry application examples and open research problems.
| LLM lifecycle stage | Classical approach | Possible quantum enhancement | Sustainability rationale | Time scale, Potential |
|---|---|---|---|---|
| Data collection and curation | Massive web scraping, distributed data deduplication and filtering | Quantum-assisted clustering/deduplication (via advanced sampling) [24] | Reduced redundant data lowers overall data processing/storage costs | Medium-term, low |
| Preprocessing and encoding | Text tokenization (byte-pair encoding, WordPiece) | Compact data-loading circuits (e.g. QRAM [25], amplitude encoding [26]) | Potentially fewer large-scale CPU/GPU cycles used in repeated data transformations | Long-term, low |
| Model initialization and architecture | Random weight initialization (Xavier, Kaiming), billion+ parameter models | Quantum hyperparameter search [27], hybrid quantum neural network layers [28] | Smaller, more expressive models can lower energy consumption | Medium-term, high |
| Training (core loop) | Stochastic gradient descent, Adam, large-scale distributed training, mixed-precision training | Quantum gradient methods [29], quantum natural gradient [30], quantum approximate optimization algorithm (QAOA) [31] | Fewer iterations/epochs lead to lower energy usage in high-performance computing (HPC) clusters | Medium-term, medium |
| Training (fine-tuning and distillation) | Domain-specific fine-tuning, knowledge distillation, pruning | Quantum-assisted low-rank approximation [32], quantum-based distillation and quantum reinforcement learning [33] for fine-tuning and knowledge distillation | Smaller distilled models reduce energy usage for both training and inference, yet can still achieve — or even surpass — the performance levels of larger counterparts | Near-term, high |
| Inference and deployment | Quantization/model compression (e.g., INT8/FP16) for faster lower-memory inference | Quantum approximate optimization algorithm (QAOA) and quantum annealing [34] to identify which filters, neurons, or blocks in a network contribute least to accuracy and then pruning them | Faster inference time and hardware requirements while providing same performance levels as much larger models | Near-term, medium |
| Maintenance and monitoring | Continuous monitoring, drift detection, logging of billions of requests | Quantum-accelerated anomaly detection [35], drift monitoring | Proactive retraining (done only when needed) lowers energy consumption | Medium-term, medium |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum Computing Algorithms and Architecture · Parallel Computing and Optimization Techniques · Neural Networks and Applications
1]\orgnameQuantumBasel, \orgaddress\streetSchorenweg 44b, \cityArlesheim, \postcode4144, \countrySwitzerland
2]\orgnameCenter for Quantum Computing and Quantum Coherence (QC2), University of Basel, \orgaddress\streetPetersplatz 1, \cityBasel, \postcode4001, \countrySwitzerland
3]\orgnameNCCR SPIN, University of Basel, \orgaddress\streetKlingelbergstrasse 82, \cityBasel, \postcode4056, \countrySwitzerland
Accelerating the drive towards energy-efficient generative AI with quantum computing algorithms
\fnmFrederik F. \surFlöther
\fnmJan \surMikolon
\fnmMaria \surLongobardi
[
[
[
Abstract
Abstract. Research and usage of artificial intelligence, particularly generative and large language models, have rapidly progressed over the last years. This has, however, given rise to issues due to high energy consumption. While quantum computing is not (yet) mainstream, its intersection with machine learning is especially promising, and the technology could alleviate some of these energy challenges. In this perspective article, we break down the lifecycle stages of large language models and discuss relevant enhancements based on quantum algorithms that may aid energy efficiency and sustainability, including industry application examples and open research problems.
1 The need for quantum-enhanced efficient generative AI
The last years have seen an explosion in the development and adoption of artificial intelligence (AI) models and algorithms. In particular, generative AI techniques [1] and large language models (LLMs) [2] have shown astonishing improvements. At the same time, there are a number of risks with this new generation of models [3, 4]. The energy consumption and environmental impact [5], for both the training and inference stage [6], are among the most important challenges. For instance, the OpenAI o3 and DeepSeek-R1 models consume over 33Wh per long prompt [7].
Quantum computing is at an earlier stage than AI; it has not yet experienced commercial breakthroughs. The technology represents a fundamentally different approach to processing information, using quantum bits (qubits) instead of bits. Through the clever exploitation of quantum mechanical effects such as entanglement, interference, and superposition, novel algorithms become possible which allow significant, in some cases exponential, improvements compared with classical techniques [8]. It is, in fact, the only known computational model which enables such exponential speedups [9]. The potential benefits go beyond speed; quantum computers might also allow AI models to be developed and calculations to be performed with greater accuracy, higher energy efficiency, and lower input data requirements (in terms of quality and quantity). In fact, there is also a symbiosis as AI, including LLMs, is being explored to accelerate the development of quantum technology [10, 11, 12] while quantum computing is being researched for a myriad of machine learning tasks [13].
Despite quantum computing’s considerable potential, its energy efficiency is still being debated [14]. The controlled laboratory environment required by leading quantum computing architectures also generates significant power demands, which, depending on the exact technology, may have an order of magnitude of tens of kW [15]. Comparing that with the MW power that classical supercomputers often require suggests that quantum computers can be competitive in terms of energy consumption-related quantum advantages.
One of the challenges of applying quantum algorithms to problems involving classical data has been the “input problem”, which refers to the fact that efficiently loading large volumes of classical data into today’s quantum computers remains difficult [16]. While research is trying to address this issue, involving, for instance, quantum random access memories (QRAMs) [17] and coresets [18, 19], loading larger volumes of classical data today is not feasible or might actually erase any quantum advantages.
One reason why modern generative AI requires so much energy is the size of the models. Cutting-edge models typically have billions of parameters, which enable them to achieve optimal performance [20]. There are research efforts underway to make AI models smaller [21]. Another promising avenue that has emerged with regard to creating more efficient architectures is based on a class of brain-inspired continuous-time recurrent neural networks called liquid neural networks [22]. As AI models become more compact, they are becoming more and more amenable to integration with quantum algorithms due to the aforementioned “input problem”, opening up the possibility of further training efficiency gains.
Given the proliferation and increasing energy consumption of AI, the potential of quantum computing to drive efficiency gains in this space is of great interest. The present discussion focuses on quantum computing algorithms that can significantly increase the efficiency of certain key tasks in the lifecycle of LLMs. Clearly, there are other ways in which quantum computing techniques may address AI energy consumption challenges, which could be in the inference stage or an entirely different application space, such as the optimization of renewable energy integration in AI-driven data centers [23].
2 Applications of quantum algorithms in generative AI development
Table 1 summarizes a selection of promising quantum algorithms and techniques which lend themselves to near-term and long-term enhancements of classical AI model training/inference efficiency.
The rightmost column gives a very rough assessment of the expected time scale when the quantum enhancements could generate value as well as the degree of their expected impact. For example, the lack of maturity of QRAM technology, and the relative ease with which classical techniques can handle preprocessing and encoding steps, suggests that this is an area where quantum computing may only generate value in the long term and the impact is likely low. Similarly, for the training (core loop) stage, the general need for big (classical) data processing makes it difficult for quantum computers to add value here in the near future. On the other hand, the training (fine-tuning and distillation) stage could see impactful quantum enhancements sooner, given that one can already conduct significant fine-tuning with on the order of 10–100 samples in certain cases.
Although the rapid advancements in quantum-assisted AI are very promising, real-world industry tests have only just begun. Still, they demonstrate the potential of quantum algorithms to enhance AI efficiency. The following application examples, aligned with Table 1, provide practical evidence of how quantum computing may contribute to various stages of AI model development.
In the data collection and curation stage, quantum-enhanced clustering algorithms may support efficient data streamlining and storage. E.ON and Technical University of Munich demonstrated how a quantum k-means algorithm can be adapted to real quantum hardware, clustering high-dimensional real-world German electricity grid data [36].
In the preprocessing and encoding stage, it is possible to leverage sophisticated data encoding in quantum algorithms. For instance, a group including Medical University of Vienna, Johannes Kepler University Linz, and Software Competence Center Hagenberg leveraged linear time quantum data encoding in the classification of clinical data [37].
In the model initialization and architecture stage, quantum algorithms may enable efficiency improvements through enhanced hyperparameter optimization. A consortium encompassing Lighthouse Disruptive Innovation Group, Vueling Airlines, University Ramon Llull, and MIT Media Lab employed a Fourier series method to represent the search space [38]. The data included flight no-show information as the dependent variable and features such as origin, destination, and time of flight. Variational quantum circuits were trained to select suitable hyperparameters. Furthermore, when novel architectures such as quantum neural networks are employed, both quantum [39] and classical [40] frameworks are being studied for tasks such as model training and compression.
In the training (core loop) stage, quantum algorithms may help tune the parameters of a classical neural network and overcome problems associated with backpropagation and gradient descent methods. Researchers from Politehnica University of Timișoara adapted Grover’s algorithm in order to improve classical neural network weight optimization techniques, using the Digits dataset from scikit-learn [41].
In the training (fine-tuning and distillation) stage, researchers from IonQ introduced a hybrid quantum-classical deep learning architecture with the goal of improving token prediction for fine-tuned LLMs [42]. A group from Beijing University of Posts and Telecommunications explored a quantum knowledge distillation model for LLMs based on variational quantum circuits, using this approach to extract emotional information from text, detecting concealed sensitive information within media, and uncovering implicit themes in text [43].
In the inference and deployment stage, quantum algorithms could help further compress classical AI models, thus improving inference times. Large deep neural networks may often be significantly pruned while maintaining performance; the “lottery ticket hypothesis” suggests that for certain architectures specific subnetworks may be found that reach comparable accuracies in a similar number of iterations as the original network [44]. A Colorado State University researcher proposed several schemes, including variational quantum algorithms and quantum optimization methods, to enable quantum neuron selection with the goal of making this subnetwork discovery more efficient [45].
Finally, in the maintenance and monitoring stage, quantum algorithms could improve the discovery of anomalies in order to better detect and prevent drift. A consortium including European Organization for Nuclear Research (CERN), IBM, and IRIS Analytics considered ensemble approaches where classical algorithms were combined with quantum feature selection in the context of fraud detection [46].
These industry efforts suggest that quantum computing could enhance energy efficiency throughout generative AI pipelines and help address other issues plaguing today’s generative AI, such as hallucinations and low robustness. In fact, there are already early efforts that explore the implementation of transformer architectures and LLMs on quantum computers [47, 48, 49].
3 Outlook and open research challenges
Although quantum computing holds considerable promise, most machine learning tasks will not see drastic benefits in the short term. Core processes like data preprocessing, feature engineering, and gradient-based optimization are already highly efficient on classical hardware involving central processing units (CPUs) / graphics processing units (GPUs) / tensor processing units (TPUs). Training deep learning models, especially transformers, relies on numerical methods that sometimes lack clear quantum speedups. Quantum computing is more likely to provide advantages in areas like quantum-enhanced kernel methods [50] or reinforcement learning. In reinforcement learning applications, quantum algorithms could enhance policy optimization, accelerate exploration through quantum sampling, and improve decision-making in complex environments. Quantum-inspired methods, as well as quantum annealing for solving multi-agent or high-dimensional problems, may offer further advantages over classical approaches. While still in its early stages, quantum-enhanced reinforcement learning could become another key area where quantum algorithms provide significant benefits.
Moreover, for quantum computing to realize its potential in enhancing AI efficiency, several fundamental challenges in hardware and architecture must be addressed. The realization of practical quantum advantages for AI requires significant improvements in areas such as the number of qubits, noise, memory efficiency, and system architecture, all of which currently limit the applicability of quantum algorithms in machine learning workflows. The limited number of qubits available in current systems is an important bottleneck in the adoption of quantum computing for AI, restricting, for instance, the number of features. In addition, tailored quantum hardware architectures for AI applications will also be critical for unlocking the full potential of quantum-enhanced machine learning. For example, in the same way that GPUs and other classical hardware is being optimized for AI training and inference tasks, quantum hardware and integrated quantum-classical architectures will need to be further fine-tuned to achieve optimal performance and energy efficiency. Early signs of that happening are already visible, given the increasing number of application-specific quantum algorithm tests as well as research around connecting multiple types of (different-qubit-modality) quantum systems together [51].
4 Acknowledgments
This work was supported as part of NCCR SPIN, a National Centre of Competence in Research, funded by the Swiss National Science Foundation (grant number 225153).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1\bibcommenthead
- 2Sengar et al. [2024] Sengar, S.S., Hasan, A.B., Kumar, S., Carroll, F.: Generative artificial intelligence: a systematic review and applications. Multimedia Tools and Applications, 1–40 (2024)
- 3Naveed et al. [2023] Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., Mian, A.: A comprehensive overview of large language models. ar Xiv preprint ar Xiv:2307.06435 (2023)
- 4Bommasani et al. [2021] Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.: On the opportunities and risks of foundation models. ar Xiv preprint ar Xiv:2108.07258 (2021)
- 5Wach et al. [2023] Wach, K., Duong, C.D., Ejdys, J., Kazlauskaitė, R., Korzynski, P., Mazurek, G., Paliszkiewicz, J., Ziemba, E.: The dark side of generative artificial intelligence: A critical analysis of controversies and risks of Chat GPT. Entrepreneurial Business and Economics Review 11 (2), 7–30 (2023)
- 6George et al. [2023] George, A.S., George, A.H., Martin, A.G.: The environmental impact of AI: a case study of water consumption by Chat GPT. Partners Universal International Innovation Journal 1 (2), 97–104 (2023)
- 7Desislavov et al. [2023] Desislavov, R., Martínez-Plumed, F., Hernández-Orallo, J.: Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning. Sustainable Computing: Informatics and Systems 38 , 100857 (2023)
- 8Jegham et al. [2025] Jegham, N., Abdelatti, M., Elmoubarki, L., Hendawi, A.: How hungry is AI? benchmarking energy, water, and carbon footprint of LLM inference. ar Xiv preprint ar Xiv:2505.09598 (2025)
