Variational Quantum Eigensolvers in the Era of Distributed Quantum   Computers

Ilia Khait; Edwin Tham; Dvira Segal; Aharon Brodutch

arXiv:2302.14067·quant-ph·March 1, 2023

Variational Quantum Eigensolvers in the Era of Distributed Quantum Computers

Ilia Khait, Edwin Tham, Dvira Segal, Aharon Brodutch

PDF

Open Access

TL;DR

This paper demonstrates that distributed quantum computing architectures with limited inter-module communication can effectively solve quantum problems, offering a promising approach for near-term modular quantum processors.

Contribution

It introduces a variational quantum eigensolver tailored for a two-module architecture, showing that limited inter-module operations significantly enhance performance.

Findings

01

Three inter-module operations outperform no inter-module communication.

02

Distributed architectures can match monolithic performance with limited communication.

03

Near-term modular quantum processors are viable alternatives to large monolithic systems.

Abstract

The computational power of a quantum computer is limited by the number of qubits available for information processing. Increasing this number within a single device is difficult; it is widely accepted that distributed modular architectures are the solution to large scale quantum computing. The major challenge in implementing such architectures is the need to exchange quantum information between modules. In this work, we show that a distributed quantum computing architecture with {\it limited} capacity to exchange information between modules can accurately solve quantum computational problems. Using the example of a variational quantum eignesolver with an ansatz designed for a two-module (dual-core) architecture, we show that three inter-module operations provide a significant advantage over no inter-module (or serially executed) operations. These results provide a strong indication that…

Equations15

∣ ψ_{GS} ⟩ \to ∣ ψ_{GS}^{(8)} ⟩ = i = 1 \sum 8 λ_{i} ∣ ϕ_{1}^{(i)} ⟩ ∣ ϕ_{2}^{(i)} ⟩ .

∣ ψ_{GS} ⟩ \to ∣ ψ_{GS}^{(8)} ⟩ = i = 1 \sum 8 λ_{i} ∣ ϕ_{1}^{(i)} ⟩ ∣ ϕ_{2}^{(i)} ⟩ .

H_{TFIM} = - J i = 1 \sum N - 1 σ_{i}^{z} σ_{i + 1}^{z} - h_{x} i = 1 \sum N σ_{i}^{x},

H_{TFIM} = - J i = 1 \sum N - 1 σ_{i}^{z} σ_{i + 1}^{z} - h_{x} i = 1 \sum N σ_{i}^{x},

H_{XYZ} =

H_{XYZ} =

+ h_{x} i = 1 \sum N S_{i}^{x},

H_{Heis} =

H_{Heis} =

\to

+ J_{FM} α, i = 1 \sum N σ_{2 i - 1}^{α} σ_{2 i}^{α} .

∣ ψ_{GS} ⟩ \to i = 1 \sum d λ_{i} ∣ ϕ_{1}^{(i)} ⟩ ∣ ϕ_{2}^{(i)} ⟩,

∣ ψ_{GS} ⟩ \to i = 1 \sum d λ_{i} ∣ ϕ_{1}^{(i)} ⟩ ∣ ϕ_{2}^{(i)} ⟩,

∣ ψ_{targ} ⟩ = \frac{λ _{1} ∣ ϕ _{1}^{(1)} ⟩ ∣ ϕ _{2}^{(1)} ⟩ + λ _{2} ∣ ϕ _{1}^{(2)} ⟩ ∣ ϕ _{2}^{(2)} ⟩}{λ _{1}^{2} + λ _{2}^{2}},

∣ ψ_{targ} ⟩ = \frac{λ _{1} ∣ ϕ _{1}^{(1)} ⟩ ∣ ϕ _{2}^{(1)} ⟩ + λ _{2} ∣ ϕ _{1}^{(2)} ⟩ ∣ ϕ _{2}^{(2)} ⟩}{λ _{1}^{2} + λ _{2}^{2}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture · Quantum and electron transport phenomena · Quantum Information and Cryptography

Full text

Variational Quantum Eigensolvers in the Era of Distributed Quantum Computers

Ilia Khait

Entangled Networks Ltd., Toronto, Ontario, M4R 2E4, Canada

Department of Physics and Centre for Quantum Information and Quantum Control, University of Toronto, Toronto, Ontario, Canada M5S 1A7

[email protected]

Edwin Tham

Entangled Networks Ltd., Toronto, Ontario, M4R 2E4, Canada.

Dvira Segal

Department of Chemistry, University of Toronto, 80 Saint George St., Toronto, Ontario, M5S 3H6, Canada

Department of Physics and Centre for Quantum Information and Quantum Control, University of Toronto, Toronto, Ontario, Canada M5S 1A7

Aharon Brodutch

Entangled Networks Ltd., Toronto, Ontario, M4R 2E4, Canada

Abstract

The computational power of a quantum computer is limited by the number of qubits available for information processing. Increasing this number within a single device is difficult; it is widely accepted that distributed modular architectures are the solution to large scale quantum computing. The major challenge in implementing such architectures is the need to exchange quantum information between modules. In this work, we show that a distributed quantum computing architecture with limited capacity to exchange information between modules can accurately solve quantum computational problems. Using the example of a variational quantum eignesolver with an ansatz designed for a two-module (dual-core) architecture, we show that three inter-module operations provide a significant advantage over no inter-module (or serially executed) operations. These results provide a strong indication that near-term modular quantum processors can be an effective alternative to their monolithic counterparts.

Introduction.– Quantum computers promise significant speed-up for a diverse set of problems [1, 2, 3, 4]. However, the quantum advantage over classical computation only becomes appreciable when the problem size (i.e., the number of qubits required to solve the problem) is sufficiently large. Yet in practice, increasing the number of useful qubits on a quantum processing unit (QPU) is challenging: Generally, there is a trade-off between qubit count and qubit quality [5, 6, 7, 8]. Modular architectures, where small high quality QPUs are interconnected, offer a more sustainable solution to the scaling problem than a monolithic approach [9, 10, 11, 12, 13, 14]. In small devices, high-fidelity qubit operations are easier to engineer, and corresponding verification and validation are more tractable. Modular approaches, however, require transmission of quantum information between QPUs. This information exchange can be used to create effective interactions between qubits residing on different QPUs. In general, information transfer between different modules is significantly slower and less reliable than between qubits assigned to the same module. We call this the quantum interconnect bottleneck (QIB).

An increasingly salient architectural question for quantum computers concerns trade-offs in using an interconnected multi-module quantum device: Do the overheads associated with the QIB outweigh the benefits of adding qubits to a monolithic device?

Suppose one aims to run a circuit that requires $N$ qubits but only has access to $M$ -qubit devices with $M<N$ . Assuming these devices can exchange quantum information using a quantum interconnect, it is possible to recompile the circuit [15] such that it uses the interconnect $n_{i}$ times. To quantify the benefits of a quantum interconnect we compare a dual-core solution to a naïve approach with comparable running time – solving different parts of the problem on separate QPUs, and relying solely on classical communication; we refer to this as the separable (or $n_{i}=0$ ) solution. The dual core solution consists of two interconnected QPUs with $N\over 2$ qubits each, while assuming that each QPU individually has an all-to-all connectivity map, i.e., within each module, qubits can interact directly with every other qubit.

If one allows $\mathcal{O}(N)$ interconnect uses, the aforementioned architecture becomes equivalent to an all-to-all $N$ -qubit device. However, the QIB combined with practical considerations, such as decoherence, requires limiting $n_{i}$ . As described below, $n_{i}=3$ is not only sufficient for the problems we consider, but it also shows a significant improvement over the separable solution. Specifically, we show that for a dual-core architecture, the estimation error arising from the expressibility of a limited-connectivity ansatz is exponentially suppressed with $n_{i}$ .

In Fig. 1 (a) we show our variational ansatz, which is composed of single-qubit operations along with the $ZZ$ -gate, $ZZ\left(\phi\right)=\exp\left(i\frac{\phi}{2}\sigma_{z}\otimes\sigma_{z}\right)$ , a common entangling operation in trapped-ion devices [16, 17]. We treat $ZZ$ gates that straddle two clusters of $N/2$ qubits as interconnect mediated remote gates. We compare performance between the separable and modular architectures for a common algorithm, the variational quantum eigensolver (VQE) [18, 19], which estimates the ground state energy of a Hamiltonian. To make a simple comparison between the separable solution and the interconnected one, we only consider the precision of the result on the final circuit, and avoid issues related to the performance of the optimization stage.

Interconnect advantage.– Decomposing a state into its principal components 111This decomposition can be achieved via SVD, or the Schmidt decomposition, which is used here. is a core technique in many numerical recipes such as the DMRG [21], where a truncation is performed based on the diminishing return on fidelity of storing more basis components (at a high cost). Such a rationale is used for understanding the power of interconnects. Suppose one has two QPUs, each capable of preparing any state in its $M$ -qubit Hilbert space. The state of that dual-QPU system is a product state $|\psi\rangle=|\psi_{1}\rangle|\psi_{2}\rangle$ , where $|\psi_{i}\rangle$ is the state of the $i$ th QPU. Every application of a remote operation between QPUs increases inter-QPU entanglement, as expressed by the rank ( $d$ ) of the Schmidt decomposition of $|\psi\rangle$ , cut along the two QPUs: $\psi^{(d)}=\sum_{i=1}^{d}c_{i}|\psi_{1}^{(i)}\rangle|\psi_{2}^{(i)}\rangle$ . Note that with a sufficiently expressive intra-QPU ansatz ( $U(\vec{\theta})$ in Fig. 1) the entanglement rank $d$ can rise quickly, up to exponential in number of inter-QPU operations $n_{i}$ (i.e. $d\leq 2^{n_{i}}$ ).

Procedure.– VQE is an iterative classical-quantum hybrid algorithm, which estimates the ground state energy of a given Hamiltonian [3, 22]. A quantum computer produces an approximation of the Hamiltonian’s ground-state based on a parameterized ansatz; in turn, a classical strategy for converting a Hamiltonian into a series of compactly-implementable observables estimates an energy-eigenvalue from that approximate ground-state [23, 24, 25]. This process is repeated, with varied ansatz parameters chosen by an optimization strategy until a sufficiently refined ground-state is reached [26]. In what follows, we demonstrate that the $n_{i}=3$ dual-core parameterized circuit suggested in Fig. 1 provides an excellent approximation to the exact ground state of interacting systems.

Apart from ansatz expressibility (how closely the ansatz can approximate an arbitrary quantum state), VQE results also depend on classical factors like the Hamiltonian-to-observable map and parameter optimization routines. Since we intend to test how expressibility is augmented by interconnects in a multi-QPU setup, we avoid confounding classical issues by maximizing the fidelity between the variational state and the exact ground state, instead of minimizing the expectation value of the Hamiltonian, $E_{\rm var}$ . The following summarizes our procedure; for details see Ref. [27]. First, we diagonalize the Hamiltonian obtaining the exact ground state, $|\psi_{\rm GS}\rangle$ . We perform an SVD (where the system is divided into two units), and retain the eight most significant contributions,

[TABLE]

Here, $\lambda_{i}$ , $|\phi_{1}^{(i)}\rangle|\phi_{2}^{(i)}\rangle$ are the $i$ th Schmidt eigenvalue and eigenvector, respectively. We iteratively build the variational solution, (see Fig. 1): We start by optimizing the fidelity towards a product state, our crudest approximation to the ground state, $|\psi_{\rm GS}^{(1)}\rangle$ , defined by the largest Schmidt eigenvalue $\lambda_{1}$ . We construct a variational approximation of this target state using the first set of unitaries $U(\vec{\theta}_{1,1})$ and $U(\vec{\theta}_{1,2})$ , by applying them on an all-polarized state $|0\dots 0\rangle_{1}|0\dots 0\rangle_{2}$ . Next, we add another Schmidt coefficient and target the state $|\psi_{\rm GS}^{(2)}\rangle$ by enlarging the set of variational parameters: A first remote operation $ZZ(\phi_{1})$ and another set of unitaries, $U(\vec{\theta}_{2,1})$ and $U(\vec{\theta}_{2,2})$ . After optimization, we add another Schmidt coefficient, with another interconnected-remote operation, and repeat till $|\psi_{\rm GS}^{(8)}\rangle$ 222Note that this is not a viable VQE optimization method in a “production” use since knowledge of the exact ground-state cannot be assumed; we employ it here only to test expressiblity..

Models.– We test three models to benchmark the interconnect-mediated ansatz: the transvese field Ising model (TFIM), the spin-half anisotropic Heisenberg model, and the spin-one Heisenberg model. The first two models are paradigmatic examples in benchmarking performances of novel methods [29, 30]; the spin-one Heisenberg model enables exploration of the impact of less local interactions (due to the casting onto spin-half operators) on the quality of the interconnected QPU solution.

The TFIM is defined as

[TABLE]

where $\sigma^{\alpha},~{}\alpha=x,y,z$ are the Pauli matrices and $N$ is the number of spins (qubits). The phase diagram of the TFIM consists of a (anti-)ferromagnetic ordered product state for positive (negative) $J$ spin-spin interaction and vanishing transverse field $h_{x}$ , and a disordered state at strong transverse magnetic field. In the thermodynamic limit, a quantum phase transition to a gapless phase occurs at $h_{x}=J$ .

The anisotropic Heisenberg model, which is used in studies of magnetic systems is given by

[TABLE]

where $S^{\alpha},~{}\alpha=x,y,z$ are spin-half operators.

Lastly, the $S=1$ Heisenberg model resembles the AKLT model [31, 32], and it is of great interest due to its topological properties. Here, it allows us to probe and benchmark the interconnected ansatz for a Hamiltonian with fewer local operators,

[TABLE]

Here $S^{\alpha},~{}\alpha=x,y,z$ are spin-one operators, and the mapping marked by the arrow splits every $S=1$ operator into a pair of spin-half operators. $J_{\rm FM}\gg J$ is chosen such that a triplet is selected for every * other* bond (originating from the $S=1$ operators), with the resulting interactions having distance of $4$ units.

Results.– We study systems with up to $12$ qubits, and demonstrate next an immense advantage for $n_{i}=3$ over $n_{i}=0$ . In Fig. 2, we compare the ground state approximation for the TFIM given the two distinct architectures. We denote the VQE solution by $E_{\rm{var}}$ , and we compare it to the exact ground state energy $E_{\rm{GS}}$ . Specifically, throughout the paper we analyze the error measure $\epsilon=\frac{E_{\rm var}}{E_{\rm GS}}-1$ .

An indication of a phase transition in the thermodynamic limit is detectable even in this small system, as can be seen in Fig. 2 by examining the product state (red squares). While for extreme field values ( $h_{x}\rightarrow 0$ , and ${h_{x}\over J}\gg 1$ ) the product state solution well approximates the exact ground state, around ${h_{x}\over J}\approx 0.5$ this approximation completely fails. In contrast, the $n_{i}=3$ ansatz maintains a lower error throughout. For presentation purposes, we display the product state error scaled down by a factor of $10$ . Overall, the interconnected solution performs well, and its error does not exceed $0.07\%$ throughout the phase diagram, compared to an error of up to $8\%$ in the separable solution. This is comparable in magnitude to the exact (non-variational) wavefunction truncated to $8$ -Schmidt terms, where infidelity is $0.01\%$ .

In Fig. 3, we examine a portion of the phase diagram of the anisotropic Heisenberg model at the fixed value $J_{x}=1.0$ and magnetic field values $h_{x}\in\{0,0.5,1.0\}$ , thus including the symmetric Heisenberg point, $(J_{x},J_{y},J_{z},h_{x})=J(1,1,1,0)$ . We plot $\epsilon$ for a $12$ -qubit problem. Excluding extreme malperforming data points, the energy convergence ratio stays well below $10^{-3}$ 333See Ref. [27], Fig. S1 for the comparison with a separable solution.: The worst performing data point, occurring in the vicinity of $(J_{x},J_{y},J_{z},h_{x})=(1,-1,0.5,1)$ along the $J_{z}$ direction, appears to be an outlier with an error $\epsilon\approx 1.4\%$ . We are unable to pinpoint the reason for this failing; neighboring data points in the $J_{y}$ direction show significantly better convergence.

In Figure 4, we study the $S=1$ -Heisenberg model at $J_{\rm FM}=10J$ 444In choosing the value $J_{\rm FM}$ we tested convergence of the resulting ground state with DMRG calculations and found complete agreements for $J_{\rm FM}>6$ ., see Eq. (4). The main panel presents the log-infidelity of the result, $\ln{\{1-\left|\langle\psi_{\rm var}|\psi_{\rm GS}\rangle\right|^{2}\}}$ . The $n_{i}=3$ solution (blue circles) displays significantly better fidelities compared to the separable one ( $n_{i}=0$ , red squares), with three orders of magnitude decrease in infidelity at $N=12$ , and significantly better results for smaller systems. The inset shows the relative energy estimation error $\epsilon$ as a function of the number of qubits. Besides a single outstanding point ( $N=10$ ), the upward trend reflects the increasing complexity of the solution as the system size grows. The outlier at $N=10$ was further examined by introducing another layer, after the third remote operation (with the VQE ansatz including 13 layers in total). This brought the relative energy error to $0.2\%$ , consistent with the linear trend seen in Fig. 4. Introducing the same change to other values of $N$ showed no significant change. We attribute this deviation to the optimization process as elaborated on next.

Discussion.– Two issues limit convergence to the exact solution: (i) Classical optimization – As described in the Procedure section, we are using differentiable programming [35, 36] to find the optimal variational parameters for the VQE ansatz, a task of growing complexity when increasing the qubit count. The number of parameters increases depending on the ansatz structure (in our ansatz for $N=12$ we have $753$ variational parameters). (ii) Expressibility of the ansatz 555For a deeper discussion about the expressibility of parameterized quantum circuits we refer the reader to Ref. [43]. – As discussed above, one can separate the effect of the interconnect (the remote gate) from the “local” layers (the unitaries $U(\vec{\theta}_{i,j})$ ). Each interconnect operation doubles the potential Schmidt rank of the state, and the role of subsequent layers is to facilitate quantum information spreading. Whether information spreads far enough depends on the number of layers and their inner structure (Fig. 1). To assess the expressibility of the ansatz without the effect of the interconnect, we have examined in Ref. [27] a single QPU architecture with all-to-all connected qubits. While an all-to-all architecture performs slightly better than the interconnected one, it comes at a greater cost as increasing the number of qubits on a QPU is a non trivial task, which a multi-QPU modular architecture aims to avoid.

The SVD eigenvalues of the TFIM decay faster than those of the $S=1$ -Heisenberg model due the topological nature of the latter’s ground state [38]. The theoretical lower bound on the infidelity is the sum of the discarded SVD eigenvalues squared. Considering that only three remote operations were allowed here, the discarded weight in the TFIM model was found to be $4\cdot 10^{-10}$ . Hence, the resulting infidelity, $10^{-6}$ is not limited by the interconnect. Similarly, in the $S=1$ Heisenberg with $N=12$ the discarded weight is $2\cdot 10^{-3}$ , and the reported infidelity of $3\cdot 10^{-3}$ is close to this bound. In conclusion, the limiting factors in solving the VQE on an interconnecetd hardware are classical optimization combined with the limited expressiblity of the ansatz, rather than the introduction of remote operations. Interestingly, we note that comparing the TFIM to the $S=1$ -Heisenberg model, the TFIM is converging much better than the latter, both in the interconnected case and in the all-to-all connected case [27]. This could be explained by the suitability of the ansatz to the specific model; though in-depth consideration of this aspect is outside the scope of this paper.

Conclusions.– In this work, we demonstrated that a distributed quantum architecture with only modest inter-QPU capacity provides a dramatic advantage in VQE computations over serial architectures with no interconnects, and is on par with an all-to-all connected QPU [27]. In all cases studied, we found that three judiciously placed inter-QPU gates were sufficient to produce a significantly better approximation to the ground state energy for Hamiltonians of interest compared to an architecture with no quantum interconnects. For the Hamiltonians studied here we find that increasing $n_{i}$ allows for an exponential improvement in the fidelity [27]. Our comparison is based on simulations, and it is therefore limited to a small number of qubits, which allowed us to overcome some aspects of classical optimization. The main conclusion from this work is that an exponential increase in the Schmidt rank w.r.t. $n_{i}$ (and subsequently in the dimension of the effective Hilbert space) manifests itself when solving a practical algorithm.

Methods such as entanglement forging and circuit knitting [39, 40, 41] attempt to overcome the absence of entanglement at the cost of running narrow circuits more times. If the QIB is ignored, an interconnected approach is favorable since it requires exponentially fewer shots. However, with QIB overheads, the operation of an interconnect may extend run-times and reduce result quality. These techniques scale exponentially worse with increased number of interconnect uses $n_{i}$ . As such, we expect that in a future work these methods could be combined with interconnects to increase the effective Hilbert space by using classical and quantum resources.

A number of important questions remain open, including the impact of noise and the imperfect nature of interconnects. Slow interconnects would increase run-time and make the computation more susceptible to decoherence; inter-QPU operations are generally expected to have lower fidelity [42, 14]; the use of fixed resource states creates overheads in gate-counts and the actual implementation of the interconnect would impact the other qubits. These limitations need to be weighed against the downside of increasing qubit count in a monolithic architecture, as well as artificially increasing qubit size using classical resources. Identifying algorithms where a limited number of interconnect uses can be proved advantageous will be an incentive for the implementation of multi-QPU architectures. We hope that our study would stimulate further work in this direction.

Acknowledgments.– The authors acknowledge fruitful discussions with Finn Lasse Buessen and Kevin Smith. The work of IK was supported by the Centre for Quantum Information and Quantum Control (CQIQC) at the the University of Toronto. DS acknowledges support from an NSERC Discovery Grant and the Canada Research Chair program.

S1 Optimization method

As the goal of this work is to test effects of a distributed ansätze, we eschew execution on a real hardware in favor of simulators. This is done for three reasons:

•

The hardware we envision has not yet been demonstrated in full.

•

We want to decouple the interconnect bottleneck from other performance issues (e.g., gate imprefections).

•

We wish to avoid issues related to shot-noise.

While evidently not scalable to an arbitrary number of qubits, direct simulations of distributed ansätze give us access to exact statevectors and energy eigenvalues for problem instances of modest size. This helps us circumvent uncertainties in the objective function that are inherently problematic for variational algorithms. Even under ideal hardware, shot noise drops slowly w.r.t. the number of shots ( $\mathcal{O}(1/\sqrt{N})$ ). Furthermore, in order to mitigate improper optimization of variational parameters as a confounding factor in our findings, we implement here a statevector simulator under the JAX auto-differentiation framework so that training can leverage a more efficient gradient-based optimizer like Adam [44, 45, 46].

On sufficiently small problem instances, direct diagonalization of Hamiltonians of interest remains tractable. We leverage this, along with the fact that truncation by Schmidt rank is possible after each use of an inter-QPU ZZ-interaction. Let us denote the exact ground eigenstate of our Hamiltonian, obtained through direct diagonalization, as $|\psi_{\rm GS}\rangle$ . Through application of SVD, we can write it as

[TABLE]

where $\lambda_{i}$ , $|\phi_{1}^{(i)}\rangle|\phi_{2}^{(i)}\rangle$ are the $i$ -th Schmidt eigenvalue and eigenvector respectively, ordered by decreasing eigenvalue magnitude (i.e., with $\lambda_{1}$ being the largest). Further, $|\phi_{1}\rangle$ and $|\phi_{2}\rangle$ each resides within respective halves of the QPU comprising clusters of $N/2$ qubits (see Fig. 1). If we truncate the ansätze shown in Fig. 1 before the first inter-QPU gate (denoted $ZZ(\phi_{1})$ ), then one should expect a product state consisting of exactly one Schmidt term. We therefore train such a truncated ansätze towards the leading Schmidt term of $|\psi_{\rm GS}\rangle$ , $|\phi_{1}^{(1)}\rangle|\phi_{2}^{(1)}\rangle$ . Upon adding more layers to the ansätze, with every additional inter-QPU operation we train towards a new target state augmented by additional Schmidt terms for $|\psi_{\rm GS}\rangle$ . Thus, we iteratively build the variational solution, one ansatz layer at a time, eventually capturing all non-trivial Schmidt terms of the ground-state. The motivation for such an approach is reminiscent of other layer-by-layer training approaches to variational algorithms, where training errors in challenging cost-function landscapes are mitigated by a reduction in the number of independent variational parameters within each layer [47, 48, 49].

The cost-function we elected to use after each ansatz layer is the fidelity, $\cal{F}$ , between some target state $|\psi_{\rm targ}\rangle$ and the state produced by the variational ansätze $|\psi_{\rm var}\rangle$ . Note that this fidelity is available to us because we simulate the action of the ansätze as well as diagonalize the desired Hamiltonian directly. A more general “in-the-field” use of VQE will require more careful selection of a cost-function that is practically accessible.

For clarity, we provide next a step-by-step description of our training procedure. Starting with the ansätze (see Fig. 1) up to but excluding the first inter-QPU gate ( $ZZ(\phi_{1})$ ), we train the unitaries by setting $|\psi_{\rm targ}\rangle=|\psi_{\rm GS}^{(1)}\rangle=|\phi_{1}^{(1)}\rangle|\phi_{2}^{(1)}\rangle$ . We vary the variational parameters to maximize the fidelity: ${\cal F}=\left|\langle\psi_{\rm PS}|\psi_{\rm var}\rangle\right|^{2}=\left|\left\langle\phi_{1}^{(1)}\middle|U(\vec{\theta}_{(1,1)})\middle|0\dots 0\right\rangle_{1}\right|^{2}\cdot\left|\left\langle\phi_{2}^{(1)}\middle|U(\vec{\theta}_{(1,2)})\middle|0\dots 0\right\rangle_{2}\right|^{2}$ , with $|\psi_{\rm PS}\rangle=|\phi_{1}^{(1)}\rangle|\phi_{2}^{(1)}\rangle$ the product state defined above, and $|\psi_{\rm var}\rangle$ as the variational state.

Next, we extend the ansätze to include all layers and parameters up to but excluding the 2nd inter-QPU gate $ZZ(\phi_{2})$ , and train the circuit towards

[TABLE]

Existing variational parameters, i.e., $\vec{\theta}_{(1,1)}$ and $\vec{\theta}_{(1,2)}$ are initialized to their pre-trained values from the previous iteration, and new parameters ( $\vec{\theta}_{(2,1)}$ and $\vec{\theta}_{(2,2)}$ ) are initialized randomly. This procedure is simply repeated, with the inclusion of additional Schmidt terms (up to $d=$ terms) in $|\psi_{\rm targ}\rangle$ in lock-step with expansion of the ansätze to contain additional inter-QPU operations, with $d=2^{n_{i}}$ . We build results using up to $d=8$ Schmidt terms, beyond which the remaining Schmidt eigenvalues become negligible for the models that we consider. Correspondingly, we thus use $n_{i}=3$ inter-QPU gates. At each optimization step, if new parameters are added, we use $200$ random initial sets of parameters out of which the optimal ones are selected.

S2 The interconnect advantage

Figure S1 shows the exponential benefit acquired with every interconnect use. As argued in the main text, each remote operation allows us to double the number of Schmidt terms. In Fig. S1 we display $\epsilon$ and the infidelities for the TFIM’s least converging point and that of the anisotropic Heisenberg model (both discussed in the main text), and the $S=1$ Heisenberg model, all with $N=12$ qubits. The data suggests an improvement on both metrics as the number of remote operations increases. Extrapolating these quantities, or increasing $n_{i}$ , might not continue that trend as the added Schmidt weight are decreasing, hence their contribution to both quantities diminishes. In addition, in practice the classical optimization task increases in complexity and one can expect a higher deviation from the theoretical infidelity (or energy) bound. Nevertheless, the interconnect advantage is apparent.

S3 Ansatz expressiblity

To remove the effect of the interconnect itself we examine here a single QPU architecture with all-to-all connected qubits. The QPU is large enough to run the entire VQE instance, and no interconnect is necessary. The accuracy of this approach depends only on the expressibility of the specific ansatz and the capability of the classical optimizer. Results are shown in Fig. S2. We examine the $S=1$ Heisenberg model at $J(1,-1,0.5,1)$ and the TFIM at $h=0.73J$ (which are the least converging data points for $N=12$ in Figs. 2, 3), and increase the number of layers up to $m=7$ . This amounts to $636$ variational parameters, as we wish to keep the number of parameters to be similar to the interconnected case. In Fig. S2 (a) we show the relative energy difference between the variational energy and the exact ground state energy, while panel (b) shows the infidelity. With increasing number of layers—and the number of variational parameters as a consequence—we theoretically increase the expressibility of the circuit. However, we find that for many optimization attempts we see a saturation of the resulting energy around $m=5$ layers, which in fact then grows as one continue to increase $m$ , probably due to limitations in the classical optimization. Nevertheless, comparing the resulting fidelities we find that the all-to-all architecture performs slightly better than the interconneced one, with a relative energy error of $0.08\%$ versus $0.09\%$ for the TFIM, and $0.3\%$ versus $1.4\%$ for the $S=1$ Heisenberg model.

S4 Comparing architectures: all-to-all (single-core) and interconnected

While the comparison throughout the text is to the separable solution, one could ask how does the limited interconnected architecture performs compared to an all-to-all QPU of the same size.

Starting with the TFIM and focusing on the least converging point at $h=0.73$ , which was discussed in the main text (see Fig. 2): The all-to-all architecture has an infidelity of $6.8\cdot 10^{-4}$ , compared with $9.1\cdot 10^{-4}$ for the interconnected case. These fidelities corresponds to $\epsilon_{\rm all-to-all}=0.09\%$ , and $\epsilon_{\rm interconnected}=0.08\%$ respectively. As for the $S=1$ Heisenberg model, the infidelities are: $3.2\cdot 10^{-3}$ for the all-to-all and $7.6\cdot 10^{-3}$ for the interconnected ansatz, and $\epsilon_{\rm all-to-all}=0.34\%$ , and $\epsilon_{\rm interconnected}=0.33\%$ respectively. As one could expect, the variational energies do not correspond to fidelities and vise versa, but merely serve as a proxy.

In conclusion, while the structure and expressibility of each of these ansätze are different, we can see only a slight advantage of the all-to-all architecure. Though it has greater expressibility, it comes with a classical optimization cost as the number of vairational parameters scales as the number of participating qubits, $N^{2}$ . This is what might limit the current performance of the single QPU architecture. However and more importantly, the real advantage of interconnected (multi-QPU) systems is the lower complexity and overheads related to e.g., calibration, control, and noise reduction, challenges that are easier to handle in modular designs that utilize smaller processors.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Brassard et al. [2002] G. Brassard, P. Hoyer, M. Mosca, and A. Tapp, Contemporary Mathematics 305 , 53 (2002) . · doi ↗
2Grover [1996] L. K. Grover, in Proc. 28th ACM Theory of computing (1996) pp. 212–219. · doi ↗
3Peruzzo et al. [2014 a] A. Peruzzo, J. Mc Clean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien, Nature communications 5 , 1 (2014 a) . · doi ↗
4Farhi et al. [2014] E. Farhi, J. Goldstone, and S. Gutmann, arxiv:1411.4028 (2014) .
5Hughes et al. [1996] R. J. Hughes, D. F. V. James, E. H. Knill, R. Laflamme, and A. G. Petschek, Phys. Rev. Lett. 77 , 3240 (1996) . · doi ↗
6Steane et al. [2000] A. Steane, C. F. Roos, D. Stevens, A. Mundt, D. Leibfried, F. Schmidt-Kaler, and R. Blatt, Phys. Rev. A 62 , 042305 (2000) . · doi ↗
7Monroe and Kim [2013] C. Monroe and J. Kim, Science 339 , 1164 (2013) , https://www.science.org/doi/pdf/10.1126/science.1231298 . · doi ↗
8Murali et al. [2020] P. Murali, D. M. Debroy, K. R. Brown, and M. Martonosi, in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) (2020) pp. 529–542. · doi ↗