Quantum-enhanced Large Language Models on Quantum Hardware via Cayley Unitary Adapters
Borja Aizpurua, Sukhbinder Singh, Augustine Kshetrimayum, Saeed S. Jahromi, Roman Orus

TL;DR
This paper demonstrates that quantum circuit blocks inserted into pre-trained large language models on a 156-qubit quantum processor can improve language modeling perplexity and recover some classical compression benefits, showing promise for quantum AI.
Contribution
It introduces Cayley-parameterised unitary adapters for LLMs, achieving perplexity improvements on real quantum hardware with minimal additional parameters.
Findings
Perplexity of Llama 3.1 8B improved by 1.4% on quantum hardware.
Achieved 83% recovery of compression-induced degradation.
Identified a noise-expressivity phase transition at larger qubit scales.
Abstract
Large language models (LLMs) have transformed artificial intelligence, yet classical architectures impose a fundamental constraint: every trainable parameter demands classical memory that scales unfavourably with model size. Quantum computing offers a qualitatively different pathway, but practical demonstrations on real hardware have remained elusive for models of practical relevance. Here we show that Cayley-parameterised unitary adapters -- quantum circuit blocks inserted into the frozen projection layers of pre-trained LLMs and executed on a 156-qubit IBM Quantum System Two superconducting processor -- improve the perplexity of Llama 3.1 8B, an 8-billion-parameter model in widespread use, by 1.4% with only 6,000 additional parameters and end-to-end inference validated on real Quantum Processing Unit (QPU). A systematic study on SmolLM2 (135M parameters), chosen for its tractability,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
