Reinforcement learning for semi-autonomous approximate quantum   eigensolver

F. Albarr\'an-Arriagada; J. C. Retamal; E. Solano; L. Lamata

arXiv:1906.06702·quant-ph·February 6, 2020·Mach. Learn. Sci. Technol.

Reinforcement learning for semi-autonomous approximate quantum eigensolver

F. Albarr\'an-Arriagada, J. C. Retamal, E. Solano, L. Lamata

PDF

Open Access

TL;DR

This paper introduces a reinforcement learning-based protocol to approximate eigenvectors of Hermitian quantum operators, achieving high fidelity with minimal iterations, useful for semi-autonomous quantum devices.

Contribution

It presents a novel reinforcement learning protocol for approximating eigenvectors of arbitrary Hermitian operators using measurement and feedback in a quantum setting.

Findings

01

Achieves over 90% fidelity in less than 10 iterations for single-qubit operators.

02

Surpasses 98% fidelity in less than 300 iterations for single-qubit operators.

03

Obtains eigenvectors with over 89% fidelity in 8000 iterations for two-qubit operators.

Abstract

The characterization of an operator by its eigenvectors and eigenvalues allows us to know its action over any quantum state. Here, we propose a protocol to obtain an approximation of the eigenvectors of an arbitrary Hermitian quantum operator. This protocol is based on measurement and feedback processes, which characterize a reinforcement learning protocol. Our proposal is composed of two systems, a black box named environment and a quantum state named agent. The role of the environment is to change any quantum state by a unitary matrix $\hat{U}_{E} = e^{- i τ \hat{O}_{E}}$ where $\hat{O}_{E}$ is a Hermitian operator, and $τ$ is a real parameter. The agent is a quantum state which adapts to some eigenvector of $\hat{O}_{E}$ by repeated interactions with the environment, feedback process, and semi-random rotations. With this proposal, we can obtain an approximation…

Equations123

∣ \overset{ˉ}{ϕ}_{A, 0} ⟩ = \hat{U}_{E} ∣ ϕ_{A, 0} ⟩ .

∣ \overset{ˉ}{ϕ}_{A, 0} ⟩ = \hat{U}_{E} ∣ ϕ_{A, 0} ⟩ .

\hat{U}_{j} = e^{- i φ_{y} \hat{S}_{y, j}} e^{- i φ_{z} \hat{S}_{z, j}} e^{- i φ_{x} \hat{S}_{x, j}},

\hat{U}_{j} = e^{- i φ_{y} \hat{S}_{y, j}} e^{- i φ_{z} \hat{S}_{z, j}} e^{- i φ_{x} \hat{S}_{x, j}},

\hat{S}_{x, j}

\hat{S}_{x, j}

\hat{S}_{y, j}

\hat{S}_{z, j}

∣ v_{0} ⟩

∣ v_{0} ⟩

∣ v_{1} ⟩

∣0 ⟩ = (10), ∣1 ⟩ = (01) .

∣0 ⟩ = (10), ∣1 ⟩ = (01) .

\hat{O}_{E}

\hat{O}_{E}

\hat{U}_{E}

∣ ϕ_{A, 0}^{(k)} ⟩ = cos (\frac{θ ^{(k)}}{2}) ∣0 ⟩ + e^{i φ^{(k)}} sin (\frac{θ ^{(k)}}{2}) ∣1 ⟩,

∣ ϕ_{A, 0}^{(k)} ⟩ = cos (\frac{θ ^{(k)}}{2}) ∣0 ⟩ + e^{i φ^{(k)}} sin (\frac{θ ^{(k)}}{2}) ∣1 ⟩,

∣ \overset{ˉ}{ϕ}_{A, 0}^{(k)} ⟩ = cos (\frac{θ ˉ ^{(k)}}{2}) ∣0 ⟩ + e^{i \overset{φ}{ˉ}^{(k)}} sin (\frac{θ ˉ ^{(k)}}{2}) ∣1 ⟩

∣ \overset{ˉ}{ϕ}_{A, 0}^{(k)} ⟩ = cos (\frac{θ ˉ ^{(k)}}{2}) ∣0 ⟩ + e^{i \overset{φ}{ˉ}^{(k)}} sin (\frac{θ ˉ ^{(k)}}{2}) ∣1 ⟩

= cos (\frac{Δ _{θ}^{(k)}}{2}) ∣ ϕ_{A, 0}^{(k)} ⟩ + e^{i Δ_{φ}^{(k)}} sin (\frac{Δ _{θ}^{(k)}}{2}) ∣ ϕ_{A, 1}^{(k)} ⟩

∣ ϕ_{A, 1}^{(k)} ⟩ = sin (\frac{θ ^{(k)}}{2}) ∣0 ⟩ - e^{i φ^{(k)}} cos (\frac{θ ^{(k)}}{2}) ∣1 ⟩ .

∣ ϕ_{A, 1}^{(k)} ⟩ = sin (\frac{θ ^{(k)}}{2}) ∣0 ⟩ - e^{i φ^{(k)}} cos (\frac{θ ^{(k)}}{2}) ∣1 ⟩ .

\hat{D}^{(k) †} = ∣0 ⟩ ⟨ ϕ_{A, 0}^{(k)} ∣ + ∣1 ⟩ ⟨ ϕ_{A, 1}^{(k)} ∣,

\hat{D}^{(k) †} = ∣0 ⟩ ⟨ ϕ_{A, 0}^{(k)} ∣ + ∣1 ⟩ ⟨ ϕ_{A, 1}^{(k)} ∣,

\hat{G}_{0}^{(k)} = \hat{D}^{(k + 1)} \hat{R}

\hat{G}_{0}^{(k)} = \hat{D}^{(k + 1)} \hat{R}

\hat{D}^{(k + 1)}

\hat{D}^{(k + 1)}

\hat{R}

\overset{u}{^}_{1} = e^{- i φ_{y} \hat{S}_{y}} e^{- i φ_{z} \hat{S}_{z}} e^{- i φ_{x} \hat{S}_{x}},

\overset{u}{^}_{1} = e^{- i φ_{y} \hat{S}_{y}} e^{- i φ_{z} \hat{S}_{z}} e^{- i φ_{x} \hat{S}_{x}},

\hat{D}^{(k + 1)} = (1 - m^{(k)}) \hat{D}^{(k)} + m^{(k)} \hat{D}^{(k)} \overset{u}{^}_{1} .

\hat{D}^{(k + 1)} = (1 - m^{(k)}) \hat{D}^{(k)} + m^{(k)} \hat{D}^{(k)} \overset{u}{^}_{1} .

w^{(k + 1)} = [(1 - m^{(k)}) r + m^{(k)} p] w^{(k)},

w^{(k + 1)} = [(1 - m^{(k)}) r + m^{(k)} p] w^{(k)},

\hat{D}^{(N) †} \hat{O}_{E} \hat{D}^{(N)} \sim λ_{0} ∣0 ⟩ ⟨ 0∣ + λ_{1} ∣1 ⟩ ⟨ 1∣.

\hat{D}^{(N) †} \hat{O}_{E} \hat{D}^{(N)} \sim λ_{0} ∣0 ⟩ ⟨ 0∣ + λ_{1} ∣1 ⟩ ⟨ 1∣.

∣ ϕ_{A, 0}^{(k)} ⟩ = j = 0 \sum d - 1 c_{j} ∣ j ⟩,

∣ ϕ_{A, 0}^{(k)} ⟩ = j = 0 \sum d - 1 c_{j} ∣ j ⟩,

∣ \overset{ˉ}{ϕ}_{A, 0}^{(k)} ⟩ = \hat{U}_{E} ∣ ϕ_{A, 0}^{(k)} ⟩ = j = 0 \sum d - 1 \overset{c}{ˉ}_{j} ∣ ϕ_{A, j}^{(k)} ⟩ .

∣ \overset{ˉ}{ϕ}_{A, 0}^{(k)} ⟩ = \hat{U}_{E} ∣ ϕ_{A, 0}^{(k)} ⟩ = j = 0 \sum d - 1 \overset{c}{ˉ}_{j} ∣ ϕ_{A, j}^{(k)} ⟩ .

\hat{D}^{(k) †} = j = 0 \sum d - 1 ∣ j ⟩ ⟨ ϕ_{A, j}^{(k)} ∣,

\hat{D}^{(k) †} = j = 0 \sum d - 1 ∣ j ⟩ ⟨ ϕ_{A, j}^{(k)} ∣,

\hat{R} = δ_{0, m^{(k)}} (I - \hat{X}) + \hat{X},

\hat{R} = δ_{0, m^{(k)}} (I - \hat{X}) + \hat{X},

\hat{D}^{(k + 1)} = j = 0 \sum d - 1 δ_{j, m^{(k)}} \hat{U}_{j}^{(k)} \hat{D}^{(k)},

\hat{X} = j = 1 \sum d - 1 (∣0 ⟩ ⟨ j ∣ + ∣ j ⟩ ⟨ 0∣)

\hat{X} = j = 1 \sum d - 1 (∣0 ⟩ ⟨ j ∣ + ∣ j ⟩ ⟨ 0∣)

\overset{u}{^}_{j} = e^{- i φ_{y} \hat{S}_{y}^{j}} e^{- i φ_{z} \hat{S}_{z}^{j}} e^{- i φ_{x} \hat{S}_{x}^{j}},

\overset{u}{^}_{j} = e^{- i φ_{y} \hat{S}_{y}^{j}} e^{- i φ_{z} \hat{S}_{z}^{j}} e^{- i φ_{x} \hat{S}_{x}^{j}},

\hat{S}_{x}^{j}

\hat{S}_{x}^{j}

\hat{S}_{y}^{j}

\hat{S}_{z}^{j}

\hat{D}^{(k + 1)} = j = 0 \sum d - 1 δ_{j, m^{(k)}} \hat{D}^{(k)} \overset{u}{^}_{m^{(k)}} .

\hat{D}^{(k + 1)} = j = 0 \sum d - 1 δ_{j, m^{(k)}} \hat{D}^{(k)} \overset{u}{^}_{m^{(k)}} .

w^{(k + 1)} = [(r - p) δ_{0, m^{(k)}} + p] w^{(k)} .

w^{(k + 1)} = [(r - p) δ_{0, m^{(k)}} + p] w^{(k)} .

∣ ϕ_{A, 0}^{(N_{0} + 1)} ⟩ = \hat{D}^{(N_{0})} ∣ ϕ_{A, 0}^{(1)} ⟩,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture · Quantum Information and Cryptography · Quantum and electron transport phenomena

Full text

Reinforcement learning for semi-autonomous approximate quantum eigensolver

F. Albarrán-Arriagada 1,2,3, J. C. Retamal 2,3, E. Solano 1,4,5 and L. Lamata 4,6

1 International Center in Quantum Artificial Intelligence for Science and Technology (QuArtist) and Physics Department, Shanghai University, 200444 Shanghai, China

2 Departamento de Física, Universidad de Santiago de Chile (USACH), Avenida Ecuador 3493, 9170124, Santiago, Chile

3 Center for the Development of Nanoscience and Nanotechnology 9170124, Estación Central, Santiago, Chile

4 Department of Physical Chemistry, University of the Basque Country UPV/EHU, Apartado 644, 48080 Bilbao, Spain

5 IKERBASQUE, Basque Foundation for Science, Maria Diaz de Haro 3, 48013 Bilbao, Spain

6 Departamento de Física Atómica, Molecular y Nuclear, Universidad de Sevilla, 41080 Sevilla, Spain

[email protected]

Abstract

The characterization of an operator by its eigenvectors and eigenvalues allows us to know its action over any quantum state. Here, we propose a protocol to obtain an approximation of the eigenvectors of an arbitrary Hermitian quantum operator. This protocol is based on measurement and feedback processes, which characterize a reinforcement learning protocol. Our proposal is composed of two systems, a black box named environment and a quantum state named agent. The role of the environment is to change any quantum state by a unitary matrix $\hat{U}_{E}=e^{-i\tau\hat{\mathcal{O}}_{E}}$ where $\hat{\mathcal{O}}_{E}$ is a Hermitian operator, and $\tau$ is a real parameter. The agent is a quantum state which adapts to some eigenvector of $\hat{\mathcal{O}}_{E}$ by repeated interactions with the environment, feedback process, and semi-random rotations. With this proposal, we can obtain an approximation of the eigenvectors of a random qubit operator with average fidelity over 90% in less than 10 iterations, and surpass 98% in less than 300 iterations. Moreover, for the two-qubit cases, the four eigenvectors are obtained with fidelities above 89% in 8000 iterations for a random operator, and fidelities of $99\%$ for an operator with the Bell states as eigenvectors. This protocol can be useful to implement semi-autonomous quantum devices which should be capable of extracting information and deciding with minimal resources and without human intervention.

1 Introduction

In the past few years, the symbiosis between quantum mechanics and machine learning into the topic named quantum machine learning (QML) has been a fruitful area [1, 2, 3, 4], either applying classical machine learning techniques to quantum tasks such as quantum metrology [5, 6], quantum state estimation [7, 8], and others [9, 11, 10, 12, 13, 14]; or using quantum mechanics to enhance machine learning algorithms for classical applications [15, 16, 17, 18, 19, 3, 20, 21]. Any machine learning algorithm can be classified into learning from big data and learning from interactions.

For the first group, we have two classes of algorithms, one of them are the supervised learning algorithms, which use a previously labeled data set named training data to infer a labeled criterion which is used to classify new data; a remarkable example is pattern recognition algorithms [22, 23, 24]. The other class is unsupervised learning algorithms. In this case, the training data is not necessary, and the approach is to group the unlabeled data in different sets, where each set is characterized by the mean value of some property of its constituents. The different groups are constructed to optimize some indicator of the dispersion in each subset with respect to the value that characterized it, e.g., the standard deviation. An example of these algorithms is the clustering problem [25, 26].

For the second group, we have the reinforcement learning (RL) algorithms [27]. Here, one accessible and manipulable system called agent $(A)$ interacts with another unknown system called environment $(E)$ . The strategy relies on $A$ improving its performance in a specific task $\mathcal{Q}(A,E)$ , which depends on the state of the systems $A$ and $E$ . This improvement employs the results of multiple interactions among $A$ and $E$ . The general framework of the RL paradigm is composed of three parts, the policy, the reward function (RF) and the value function (VF). The policy defines the main steps of the algorithm that we can divide into three steps. First, the information extraction, which considers the interaction among $A$ and $E$ , and how to obtain the information from it. Second, the feedback loop, that specifies the channel used to communicate the information extracted to $A$ . Third, the decision process, where we decide the action on $A$ in order to progress towards the aimed-for goal, and then start with the information extraction again. The RF defines the criterion to reward (punish) the actions which improve (worsen) the performance of $A$ respect to the task $\mathcal{Q}(A,E)$ at each step. Finally, the VF gives us the global performance of the algorithm, ensuring the convergence of it. One of the most impressive examples of this paradigm is the recent developing of chess, go and shogi masters players without database [28, 29]. This class of algorithms mimic the most primitive form of human learning, commonly named trial and error. It means that a near-future implementation of quantum artificial intelligence may apply this paradigm to a quantum system to enhance a quantum task as the main way to learn. For this reason, the development of the quantum version of the RL paradigm has played an important role in QML in recent years [30, 31, 3, 32, 33, 34].

A crucial task in physics is the characterization of the different interactions among systems. This characterization is helpful to evaluate the risks of our actions and act to minimize them. Therefore, any autonomous artificial intelligence must have this ability.

In quantum mechanics, a physical interaction (observable) is represented by a Hermitian matrix or quantum operator, which is characterized by its eigenvalues and eigenvectors. The calculation of the eigenvectors and eigenvalues of a quantum interaction by a classical computer implies that we need to encode the quantum information into classical bits, which is inconvenient for unknown quantum interactions. Moreover, the implementation of a full quantum eigensolver [35, 36, 37, 38] using near-future quantum computers seems impractical due to the number of needed resources [39]. The emergence of hybrid classical-quantum algorithms in the past few years [40, 41, 42, 43, 44, 45, 46] opens the door to the development of useful eigensolvers. Nevertheless, these works are mainly focused on the eigenvalues, eigenvectors, and properties of quantum systems such as molecules, being the characterization of a physical interaction less studied.

In this article, we propose a hybrid quantum-classical algorithm to calculate an approximation to the eigenvector of any quantum interaction described by a Hermitian matrix with minimal resources [47]. In our proposal, we use single-shot measurement and classical communication given by a feedback loop, which characterizes a RL protocol. The main goal of this proposal is to obtain a high-fidelity approximation (above 98% for the single-qubit case), without measuring fidelities or some expectation value, which reduce drastically the number of iterations of the algorithm, decreasing the effect of noise sources, and without human intervention. We also show how to extend the algorithm to the multiqubit and high-dimensional situations. This protocol could be useful to implement semi-autonomous quantum devices with the capability to decide using the characterization of an interaction, which is an essential ingredient for the implementation of artificial quantum intelligence [4] and artificial quantum life [48, 49].

2 Quantum eigensolver protocol

Our proposal is related to recent works about a measurement-based algorithm to adapt one known state to another unknown one [50, 51, 52]. Here, we define the general framework of our protocol based on the RL paradigm and then, we explain in details the single qubit case, the single qudit case, and the multiqubit case.

In our protocol, we consider as the agent a manipulable and known quantum system described by the state $|\phi_{A,0}\rangle$ , which correspond to any initialization of a given physical system. The environment is a black box, which produces an unknown interaction inside it. This interaction is characterized by an unknown Hermitian operator $\hat{\mathcal{O}}_{E}$ , which generates a unitary transformation $\hat{U}_{E}=e^{-i\tau\hat{\mathcal{O}}_{E}}$ over the quantum system $A$ when it interacts with the system $E$ , where $\tau$ is a parameter related to the interaction time with the black-box, e.g., a spin particle (agent) traversing a region with a magnetic field (environment) for a time $t\sim\tau$ .

The policy is as follows:

•

Information extraction: The system $A$ interacts with $E$ changing its state as

[TABLE]

Next, we perform a measurement process over $|\bar{\phi}_{A,0}\rangle$ in the basis $\{|\phi_{A,0}\rangle,...,|\phi_{A,d-1}\rangle\}$ , where $d$ is the dimension of the Hilbert space of $A$ and $\langle\phi_{A,j}|\phi_{A,k}\rangle=\delta_{j,k}$ .

•

Feedback loop: The information of the measuring process is communicated to a command center with the ability to perform a unitary transformation $\hat{\mathcal{U}}_{j}$ (quantum gate) over the state of $A$ in order to change the possible results in the next information extraction step.

•

Decision process: If the outcome of the measurement process is the state $|\phi_{A,j}\rangle$ , with $j\neq 0$ , this means that $|\phi_{A,0}\rangle$ changes when system $A$ interacts with $E$ , therefore, $|\phi_{A,0}\rangle$ cannot be an eigenvector of $\hat{\mathcal{O}}_{E}$ . In this case, we define the unitary transformation $\hat{\mathcal{U}}_{j}$ as

[TABLE]

where

[TABLE]

and $\varphi_{\alpha}$ is a random angle in the range $[-w\pi,w\pi]$ , with $w$ the searching range given by the RF. We note that $\hat{\mathcal{U}}_{j}$ is a pseudo random rotation in the subspace expanded by $\{|\phi_{A,0}\rangle,|\phi_{A,j}\rangle\}$ . For this outcome we define the state of $A$ as $\hat{\mathcal{U}}_{j}|\phi_{A,0}\rangle$ , and start again with the information extraction step.

If the outcome of the measuring process is $|\phi_{A,0}\rangle$ , it means that $|\phi_{A,0}\rangle$ could be an eigenvector of $\hat{\mathcal{O}}_{E}$ . We point out that the eigenvectors of an operator remain constant up to a global phase under the action of a function of this operator. In this case, we apply the identity operator $\mathbb{I}$ . Moreover, we keep the same state $|\phi_{A,0}\rangle$ and start again with the information extraction step. Figure 1 shows a scheme of the policy of the algorithm.

For the RF we define the reward rate $r<1$ and the punishment rate $p>1$ . If the outcome of the measure is $|\phi_{A,0}\rangle$ we define $\bar{w}=w\cdot r$ and $\bar{w}=w\cdot p$ in other case. Finally, we renamed $w=\bar{w}$ for the next iteration of the algorithm, which means that when we measure $|\phi_{A,0}\rangle$ we reduce the searching range, and we increase it in other case. The initial value for $w$ is chosen according to the problem.

As we can note, the protocol does not need store the states, or all the history of the algorithm, it only needs to store the final operation $\hat{D}^{(N)}$ via storing the parameters that characterize this operation classically.

To ensure the convergence of our algorithm, we define the VF as the value of $w$ . This implies that, when $w\rightarrow 0$ , our protocol converges. For a correct choice of $r$ and $p$ we have that $w\rightarrow 0$ only if we obtain, in the measurement process of $|\bar{\phi}_{A,0}\rangle$ , the outcome $|\phi_{A,0}\rangle$ many times in a row. This means that $\langle\phi_{A,0}|\bar{\phi}_{A,0}\rangle\sim 1$ , therefore $|\phi_{A,0}\rangle$ is an approximate eigenvector of $\hat{\mathcal{O}}_{E}$ .

As this is an iterative protocol, we define the following notation for the remainder of the article: any super-index between parenthesis refers to the iteration of the algorithm, e.g., $|\phi_{A,0}^{(4)}\rangle$ is the state of $A$ before the interaction with $E$ in the fourth iteration. Similarly, $\hat{\mathcal{U}}^{(k)}_{j}$ is the unitary transformation defined in the decision process for the iteration $k$ . As a special case, the super-index $(1)$ refers to the initial values, e.g., $w^{(1)}$ represents the initial searching range.

It is necessary to mention that our algorithm uses one single-shot measurement per loop, representing advantage with respect to employing an expectation value or the fidelity. The latter imply hundreds of measurements for a two-level system, being this proposal exposed less time to noise sources. Also, as we use pseudo-random operations $\hat{D}^{(k)}$ , the effect of any noise in the gate can be seen as part of the randomness of the protocol.

2.1 Single-qubit case

In the single-qubit case, $\hat{\mathcal{O}}_{E}$ is described by a $2\times 2$ Hermitian matrix with eigenvectors $\{|v_{0}\rangle,|v_{1}\rangle\}$ and eigenvalues $\{\lambda_{0},\lambda_{1}\}$ respectively. As these two eigenvectors are orthonormal, we can write

[TABLE]

where $\alpha\in[0,2\pi]$ , $\beta\in[0,\pi]$ and

[TABLE]

We define $\hat{\mathcal{O}}_{E}$ and $\hat{U}_{E}$ as

[TABLE]

Policy. In this case, we write the state $|\phi_{A,0}^{(k)}\rangle$ before the black-box as

[TABLE]

and the state $|\bar{\phi}_{A,0}^{(k)}\rangle$ after $E$ as

[TABLE]

where

[TABLE]

For the explicit form $\bar{\theta}^{(k)}$ and $\bar{\phi}^{(k)}$ in terms of $\alpha$ , $\beta$ , $\tau$ and the eigenvalues of $\hat{\mathcal{O}}_{E}$ see appendix A. Moreover, for the explicit form of $\Delta_{\theta}^{(k)}$ and $\Delta_{\phi}^{(k)}$ , see appendix B. Now, to perform the measurement process over $|\bar{\phi}_{A,0}^{(k)}\rangle$ , we apply the basis-rotation matrix

[TABLE]

in order to measure in the basis $\{|0\rangle,|1\rangle\}$ for all iterations. After the measurement process, the state of $A$ is $|m^{(k)}\rangle$ , where $m^{(k)}\in\{0,1\}$ is the outcome of the measurement with probabilities $\mathcal{P}_{0}^{(k)}=\cos^{2}(\Delta^{(k)}/2)$ and $\mathcal{P}_{1}^{(k)}=\sin^{2}(\Delta^{(k)}/2)$ , respectively. If $m^{(k)}=0$ , then we transform the state $|0\rangle\rightarrow|\phi_{A,0}^{(k)}\rangle$ , using the matrix $\hat{D}^{(k)}$ , and start again the algorithm. If $m^{(k)}=1$ , we transform the state $|1\rangle\rightarrow|\phi_{A,0}^{(k)}\rangle$ using $\hat{D}^{(k)}\sigma_{x}$ , where $\sigma_{x}$ is the Pauli matrix $x$ , and apply the pseudo-random operator $\hat{\mathcal{U}}_{1}^{(k)}$ defined by Eq. (2). Then, after the measurement process, we apply over $|m^{(k)}\rangle$ the operator $\hat{G}^{(k)}_{0}$ defined by

[TABLE]

where

[TABLE]

Given that $\hat{D}^{(k)}$ transforms $|\phi_{A,j}^{(k)}\rangle\rightarrow|j\rangle$ ( $|j\rangle\in\{|0\rangle,|1\rangle\}$ ), we can write $\hat{\mathcal{U}}_{1}^{(k)}=\hat{D}^{(k)}\hat{u}_{1}\hat{D}^{(k)\dagger}$ , where

[TABLE]

with $\hat{S}_{j}=(1/2)\sigma_{j}$ the spin operators, with $\sigma_{j}$ the Pauli matrix $j$ . Then, the operator $\hat{D}^{(k+1)}$ reads

[TABLE]

For this case, the RF that defines the value of $w^{(k+1)}$ for each step reads

[TABLE]

where $r$ and $p$ are the reward rate and punishment rate, respectively described previously.

When the algorithm converges, we have $|\phi_{A,0}^{(N)}\rangle\approx|\bar{\phi}_{A,0}^{(N)}\rangle$ , where $N$ is the number of iterations. Moreover, in this case $\hat{D}^{(N)}$ is an approximation of the matrix that diagonalizes $\hat{\mathcal{O}}_{E}$ , that is

[TABLE]

In order to explore the complete space we must choose $w^{(1)}=1$ .

2.2 Single-qudit case

In this case, the agent is a $d$ -dimensional system or qudit, the operator $\hat{\mathcal{O}}_{E}$ is described by a $d\times d$ Hermitian matrix with eigenvalues $\{\lambda_{j}\}$ , eigenvectors $\{|v_{j}\rangle\}$ and $j=\{0,1,2,...,d-1\}$ . In the $k$ th iteration of the algorithm, the state of $A$ before $E$ reads

[TABLE]

while for simplicity we choose the initial state $|\phi_{A,0}^{(1)}\rangle=|0\rangle$ . After the interaction with $E$ , we have

[TABLE]

Subsequently, we apply the operator $\hat{D}^{(k)\dagger}$ , which is defined now as

[TABLE]

and perform the measurement process in the basis $\{|0\rangle,|1\rangle,...,|d-1\rangle\}$ . After this process, the state of $A$ is $|m^{(k)}\rangle$ , where $m^{(k)}\in\{0,1,...,d-1\}$ is the outcome of the measurement process. In this case the decision process applies the operator $\hat{G}^{(k)}_{0}$ defined by Eq. (11), but with

[TABLE]

where

[TABLE]

with $\hat{\mathcal{U}}_{m^{(k)}}^{(k)}$ as defined in Eq. (2) and $\hat{\mathcal{U}}_{0}^{(k)}=\mathbb{I}$ . Also in this case $\hat{\mathcal{U}}_{j}^{(k)}=\hat{D}^{(k)}\hat{u}_{j}\hat{D}^{(k)\dagger}$ , where

[TABLE]

and

[TABLE]

therefore,

[TABLE]

The state of $A$ for the next iteration reads $|\phi_{A,0}^{(k+1)}\rangle=\hat{G}^{(k)}_{0}|m^{(k)}\rangle$ .

Finally, the RF that updates the value of the searching range is given by

[TABLE]

Once the algorithm converges, we have that

[TABLE]

is an approximate eigenvector, therefore,

[TABLE]

In order to find another eigenvector of $\hat{\mathcal{O}}_{E}$ , we start again the algorithm for the iteration $N_{0}+1$ , i.e., $w^{(N_{0}+1)}=w^{(1)}=2\pi$ , but now the state before $E$ is given by $|\phi_{A,1}^{(N_{0}+1)}\rangle=\hat{D}^{(N_{0})}|\phi_{A,1}^{(1)}\rangle$ . We redefine Eq. (23) as

[TABLE]

Thus, we can calculate the operator $\hat{u}_{j}$ as in Eq. (22).

The decision process changes as

[TABLE]

where

[TABLE]

and $\hat{u}_{0}=\hat{u}_{1}=\mathbb{I}$ . Finally, the RF reads,

[TABLE]

These changes mean that we perform the protocol in the subspace orthogonal to $|\phi_{A,0}^{(1)}\rangle$ . When the algorithm converges again, after $N_{1}$ iterations more, we have that the states $\hat{D}^{(N_{0}+N_{1})}|\phi_{A,0}^{(1)}\rangle$ and $\hat{D}^{(N_{0}+N_{1})}|\phi_{A,1}^{(1)}\rangle$ are approximate eigenvectors. Therefore, to obtain the next eigenvector we perform the algorithm again but in the subspace orthogonal to $\{|\phi_{A,0}^{(1)}\rangle,|\phi_{A,1}^{(1)}\rangle\}$ , and so on. At $N=N_{0}+N_{1}+...+N_{d-2}$ iterations we have that the states $|\phi_{A,j}^{N}\rangle=\hat{D}^{(N)}|\phi_{A,j}^{(1)}\rangle$ with $j=0,1,...,d-1$ are the $d$ eigenvectors of $\hat{\mathcal{O}}_{E}$ .

2.3 Multiqubit case

For this case, we can suppose that the system $A$ is a qudit state, where now the states $|j\rangle$ of the basis, correspond to the binary representation of $j$ with $log_{2}(d)$ digits. For example, for $d=16$ we have $4$ digits, where each of them represents the state of a qubit; then $|5\rangle=|0101\rangle$ . Also, we can produce the different operators $\hat{u}_{j}$ using controlled-not gates and single-qubit rotations [53]. Therefore, we can map this problem to the qudit case obtaining the same algorithm as in the previous case.

As we can see from this section, our protocol does not need to encode quantum information in a classical processor, being advantageous with respect to classical algorithms that need to characterize the quantum interactions by quantum tomography. The latter imply hundreds of measurements of the quantum system, using in this process more resources than the entire algorithm proposed. Moreover, as our algorithm finds the eigenstate statistically, it is simpler than a full quantum algorithm that finds the eigenstates exactly, being our protocol experimentally feasible. The references [51, 52] show the experimental implementation of an algorithm that employs the same basics steps in which our current algorithm is based, for the case of quantum states, instead of quantum operators, opening the door to the implementation of this work.

3 Numerical results

It is convenient to define the following quantities for the numerical analysis of the protocol, $\nu=r\cdot p\Rightarrow p=\nu/r$ , with $r$ ( $p$ ) the reward (punishment) rate, the total number of rewards $n_{r}$ and the total number of punishments $n_{p}$ in the algorithm. The VF of our algorithm is the value of $w^{(N)}=r^{n_{r}}p^{n_{p}}$ where $N=n_{r}+n_{p}$ are the total number of iterations. Also, we can rewrite

[TABLE]

where the convergence condition is given by $w^{(N)}\ll 1$ . If $\nu<1$ , we see from Eq. (32) that the convergence condition can be satisfied even if $n_{p}\sim n_{r}$ , which implies that the protocol does not necessarily converge to the eigenstates of $\hat{\mathcal{O}}_{E}$ . If $\nu=1$ , we have that $w^{(N)}\rightarrow 0\iff n_{r}\gg n_{p}$ . For $\nu>1$ , the algorithm converges whenever $n_{r}\ggg n_{p}$ . Moreover, when $\nu$ is larger, the algorithm needs more iterations to converge, but nevertheless it achieves larger fidelities. This is the exploration versus exploitation balance known in reinforcement learning. Here, we perform the simulation for a single- and two-qubit case for different values of $\nu$ and $r$ . Remember that for all cases we choose $w^{(1)}=1$ . Also, for simplicity we choose $|\phi_{A,0}^{(1)}\rangle=|0\rangle$ for the single-qubit case and $|\phi_{A,j}^{(1)}\rangle=|j_{\textrm{bin}}\rangle$ for the two-qubit case, where $j_{\textrm{bin}}$ is the binary representation of $j$ , e.g., $|\phi_{A,2}^{(1)}\rangle=|10\rangle$ . Moreover, $\hat{D}^{(1)}=\mathbb{I}$ for all cases.

Finally, as the unitary operator $\hat{u}_{j}$ given by Eq. (22) depends on pseudo-randoms angles, we perform many times the algorithm, defining the mean fidelity $\mathcal{F}$ and the mean searching range $\mathcal{W}$ as

[TABLE]

where $|\ell_{E}\rangle$ is the $\ell$ th eigenvector of $\hat{\mathcal{O}}_{E}$ , the index $i$ refers to the $i$ th repetition of the protocol and $\mathcal{N}$ is the total number of repetitions. In all subsequent cases we choose $\mathcal{N}=1000$ .

3.1 Single-qubit case

For the general performance of our protocol, we start with a $\hat{\mathcal{O}}_{E}$ described by a random Hermitian matrix. Figure 2 shows the mean fidelity $\mathcal{F}_{0}(k)=\mathcal{F}_{1}(k)$ for different values of the reward rate $r$ , and the parameter $\nu$ . From this figure, we can see that for $r=0.9$ and $\nu=2$ , we obtain $\mathcal{F}_{0}(k)>0.98$ with $k<300$ . Also, in all cases we have $\mathcal{F}_{0}(k)>0.90$ for $k<10$ . It means that using a reduced number of iterations we can obtain good fidelities for the eigenvector of a completely random single-qubit operator. On the other hand, we observe that when $r$ and $\nu$ are larger, the maximum value of $\mathcal{F}_{0}(k)$ increases, but we need more iterations for the convergence of the algorithm. Figure 3 shows the mean searching range $\mathcal{W}(k)$ for the same cases. From this figure we can clearly see how the algorithm needs less iterations when $r$ and $\nu$ decrease, with the extreme case of $r=0.6$ , $\nu=1$ , where the algorithm converges before 70 iterations.

Now, we consider a particular example $\hat{\mathcal{O}}_{E}=\hat{S}_{x}=\frac{1}{2}\sigma_{x}$ . In this case, the distance in the Bloch sphere between $|0\rangle$ and the eigenstates of $\hat{\mathcal{O}}_{E}$ is the largest possible. Figure 4 shows that our algorithm converges with few iterations to good approximations of the eigenvectors, we can see that we obtain the eigenvectors with fidelity above 98 $\%$ in 400 iterations, for the case $\nu=2$ and $r=0.9$ .

As we can see, the maximum fidelity for the case $\hat{\mathcal{O}}_{E}=\hat{S}_{x}$ has decreased with respect to the random one. This is because the distance between $|0\rangle$ and the eigenvectors of $\hat{S}_{x}$ is larger than the distance between $|0\rangle$ and the eigenvectors of $\hat{\mathcal{O}}_{E}$ in the random case, therefore, the protocol has worse convergence.

3.2 Two-qubit case

This case is analogous to the single-qudit case with $d=4$ . First, for a general performance, we consider $\hat{\mathcal{O}}_{E}$ as a random two-qubit operator. Moreover, we choose $\mathcal{N}=1000$ and calculate the mean fidelity $\mathcal{F}_{j}(k)$ and the mean searching range $\mathcal{W}_{j}$ given by Eq. (33). Figure 5 shows the numerical calculation for $r=0.9$ and $\nu=\{1.5,2\}$ . It shows again that for small $\nu$ the convergence is faster but the maximum value of $\mathcal{F}_{j}$ is smaller. Furthermore, with $\nu=2$ we need $8500$ iterations such that the four approximate eigenvectors converge. With $\nu=1.5$ , we only need $6000$ iterations. Nevertheless, for $\nu=2$ we obtain $\mathcal{F}_{j}>0.89$ for all $j$ , with even $\mathcal{F}_{2}$ and $\mathcal{F}_{3}$ up to $0.93$ . In the other case, with $\nu=1.5$ , the maximum values are $\mathcal{F}_{0}\sim 0.88$ , and $\{\mathcal{F}_{1},\mathcal{F}_{2},\mathcal{F}_{3}\}<0.92$ . Also, we can see from the evolution of $\mathcal{W}(k)$ that the number of iterations needed for the convergence is smaller each time that the algorithm starts again to approximate the next eigenvector, that is, $N_{0}>N_{1}>N_{2}$ . Finally, we consider as special case $\hat{\mathcal{O}}_{E}=\hat{B}$ , where $\hat{B}$ is an operator given by

[TABLE]

with

[TABLE]

the maximally-entangled Bell states. Figure 6 shows the performance of our protocol for this case. We can see that we obtain high fidelities ( $\mathcal{F}_{j}>0.99$ ) with only 1000 iterations to approximate the four eigenvectors. We obtain this performance due to the fact that our algorithm is sensitive to the number of the product states involved in each subspace (dimension of the subspace) and not to the total dimension of the operator $\hat{\mathcal{O}}_{E}$ . In this case, the operator $\hat{B}$ is block-diagonal, where one block acts in the subspace $\{|00\rangle,|11\rangle\}$ and the other in $\{|01\rangle,|10\rangle\}$ . This implies that the present case is similar to two independent single-qubit cases. In Fig. 6, we can see that from $k=1$ to $k=500$ we approximate the eigenstates of the first block, that is $|\phi_{\pm}\rangle$ at the same time, and from $k=501$ to $k=1000$ we approximate the eigenstates of the second block $|\psi_{\pm}\rangle$ , where both cases have a performance similar to the single-qubit case.

4 Conclusions

We propose and analyze an approximate quantum eigensolver based on reinforcement learning with minimal resources. This proposal can be classified as a hybrid classical-quantum algorithm, such that we use a classical optimization algorithm to change a quantum system to improve a quantum task using a feedback loop combined with partially-random unitary gates. This is in contrast with other hybrid algorithms that measure the fidelities or some expectation value in each step. Therefore, our proposal is advantageous with respect to the usual hybrid algorithms, in the sense that our protocol needs minimal storage to save only the last step of the algorithm and employs just one single-shot measurement per iteration, instead of fidelities or expectation-value measurements, which decrease the effect of the source of noise. Moreover, our protocol considers pseudo-random two-level rotations, such that it is not necessary to implement high-fidelity operations, because the randomness of the algorithm absorbs the errors of the gates. For this reason, our algorithm would be experimentally feasible in almost any current quantum platform.

Additionally, we validated our proposal with numerical calculations of four different choices of the operator $\hat{\mathcal{O}}_{E}$ , random single-qubit operator, $\hat{S}_{x}$ operator, random two-qubit operator, and $\hat{B}$ operator defined by Eq. (34), obtaining as a general rule that our algorithm reaches higher fidelities for the approximate eigenvectors for large values of $\nu$ and $r$ , but the convergence in this case is slower. This is related to the balance between exploration and exploitation typical from reinforcement learning algorithms. Moreover, our algorithm is sensitive to the size of the different subspaces expanded by product states and not to the size of the total space of the operator $\hat{\mathcal{O}}_{E}$ . This is the case showed in Fig. (6), where the eigenvectors are the maximally-entangled Bell states. We point out that, in order to improve the performance of the protocol in future extensions, it could be interesting to study dynamical reward rates (r) and dynamical parameter $\nu$ .

Finally, due to the simplicity, minimal resources employed by our protocol, and the fact that we need only a basic classical processor (command center) capable to perform pseudo-random rotations, it can be useful for the development of near future semi-autonomous quantum devices, which will have to make decisions with incomplete information obtained by interaction with the external environment.

We acknowledge support from Financiamiento Basal para Centros Científicos y Tecnológicos de Excelencia (Grant No. FB0807), projects QMiCS (820505) and OpenSuperQ (820363) of the EU Flagship on Quantum Technologies, EU FET Open Grant Quromorphic, Basque Government IT986-16, and PGC2018-095113-B-I00 (MCIU/AEI/FEDER, UE).

Data availability statement

The data that support the findings of this study are openly available at https://github.com/PanchoAlbarran/EigenSolver

Appendix A Explicit form of $\bar{\theta}^{(k)}$ and $\bar{\phi}^{(k)}$

Here, we further clarify the protocol developed in the main text.

From Eq. (4), we have

[TABLE]

Replacing Eq. (7) we obtain

[TABLE]

Thus,

[TABLE]

By means of the definition of $|v_{0}\rangle$ and $|v_{1}\rangle$ given by Eq. (4), we obtain

[TABLE]

We rewrite the eigenvalues as $\lambda_{0}=\delta-\lambda$ and $\lambda_{1}=\delta+\lambda$ where $\delta=(\lambda_{1}+\lambda_{0})/2$ and $\lambda=(\lambda_{1}-\lambda_{0})/2$ . Then, we rewrite Eq. (39) up to a global phase as

[TABLE]

This state has the form

[TABLE]

with

[TABLE]

Finally, up to a global phase, the state given by Eq. (42) can be written in the form of Eq. (8), where

[TABLE]

Appendix B Explicit form of $\Delta_{\theta}^{(k)}$ and $\Delta_{\varphi}^{(k)}$

From Eqs. (7) and (9) we have,

[TABLE]

Replacing this expression in the first line of Eq. (8), we obtain

[TABLE]

where

[TABLE]

and

[TABLE]

Finally, up to a global phase, we can write the state $|\bar{\phi}^{(k)}_{A,0}\rangle$ as

[TABLE]

with $\Delta_{\phi}^{(k)}=\Psi_{1}-\Psi_{0}$ .

References

[1] Adcock J C, Allen E, Day M, Frick S, Hinchliff J, Johnson M, Morley-Short S, Pallister S, Price A B and Stanisic S 2015 arXiv:1512.02900 [quant-ph]
[2] Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N and Lloyd S 2017 Quantum machine learning Nature 549 074001
[3] Dunjko V, Taylor J M and Briegel H J 2016 Quantum-Enhanced Machine Learning Phys. Rev. Lett. 117 130501
[4] Dunjko V and Briegel H J 2018 Machine learning & artificial intelligence in the quantum domain: a review of recent progress Rep. Prog. Phys. 81 195
[5] Hentschel A and Sanders B C, 2010 Machine Learning for Precise Quantum Measurement Phys. Rev. Lett. 104 063603
[6] Hentschel A and Sanders B C 2011 Efficient Algorithm for Optimizing Adaptive Quantum Metrology Processes Phys. Rev. Lett. 107 233601
[7] Torlai G, Mazzola G, Carrasquilla J, Troyer M, Melko R and Carleo G 2018 Neural-network quantum state tomography Nat. Phys. 14 447
[8] Rocchetto A, Aaronson S, Severini S, Carvacho G, Poderini D, Agresti I, Bentivegna M and Sciarrino F 2019 Experimental learning of quantum states Sci. Adv. 5 eaau1946
[9] Häse F, Kreisbeck C and Aspuru-Guzik A 2017 Machine learning for quantum dynamics: deep learning of excitation energy transfer properties Chem. Sci. 8 8419
[10] Gao J, Qiao L-F, Jiao Z-Q, Ma Y-C, Hu C-Q, Ren R-J, Yang A-L, Tang H, Yung M-H, and Jin X-M 2018 Experimental Machine Learning of Quantum States Phys. Rev. Lett. 120 240501
[11] Gupta R S and Biercuk M J 2018 Machine Learning for Predictive Estimation of Qubit Dynamics Subject to Dephasing Phys. Rev. Applied 9 064042
[12] Bukov M 2018 Reinforcement learning for autonomous preparation of Floquet-engineered states: Inverting the quantum Kapitza oscillator Phys. Rev. B 98 224305
[13] Bukov M, Day A G R, Sels D, Weinberg P, Polkovnikov A and Mehta P 2018 Reinforcement Learning in Different Phases of Quantum Control Phys. Rev. X 8 031086
[14] Melnikov A A, Nautrup H P, Krenn M, Dunjko V, Tiersch M, Zeilinger A and Briegel H J 2018 Active learning machine learns to create new quantum experiments Proc. Natl. Acad. Sci. U.S.A. 6 115
[15] Aïmeur E, Brassard G and Gambs S 2013 Quantum speed-up for unsupervised learning Mach. Learn. 90, 261
[16] Lloyd S, Mohseni M and Rebentrost P 2013 arXiv:1307.0411 [quant-ph]
[17] Rebentrost P, Mohseni M and Lloyd S 2014 Quantum Support Vector Machine for Big Data Classification Phys. Rev. Lett. 113 130503
[18] Li Z, Liu X, Xu N and Du J 2015 Experimental Realization of a Quantum Support Vector Machine Phys. Rev. Lett. 114 140504
[19] Cai X-D, Wu D, SuZ-E, Chen M-C, Wang X-L, Li L, Liu N-L, Lu C-Y and Pan J-W 2015 Entanglement-Based Machine Learning on a Quantum Computer Phys. Rev. Lett. 114 110504
[20] Sheng Y-B and Zhou L 2017 Distributed secure quantum machine learning Sci. Bull. 62 1025
[21] Schuld M and Killoran N 2019 Quantum Machine Learning in Feature Hilbert Spaces Phys. Rev. Lett. 122 040504
[22] Jain A K 2007 Biometric recognition Nature 449 38
[23] Carrasquilla J and Melko R G 2017 Machine learning phases of matter Nat. Phys. 13 431
[24] Schützhold R 2003 Pattern recognition on a quantum computer Phys. Rev. A 67, 062311
[25] Fahad A, Alshatri N, Tari Z, Alamri A, Khalil I, Zomaya A Y, Foufou S and Bouras A 2014 A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis IEEE Trans. Emerging Top. Comput. 2 267
[26] Otterbach J S, Manenti R, Alidoust N, Bestwick A, Block M, Bloom B, Caldwell S, Didier N, Fried E S, Hong S, Karalekas P, Osborn C B, Papageorge A, Peterson E C, Prawiroatmodjo G, Rubin N, Ryan C A, Scarabelli D, Scheer M, Sete E A, Sivarajah P, Smith R S, Staley A, Tezak N, Zeng W J, Hudson A, Johnson B R, Reagor M, da Silva M P and Rigetti C 2017 arXiv:1712.05771 [quant-ph]
[27] Sutton R S and Barto A G 2018 Reinforcement Learning: An introduction (Cambridge: MIT press)
[28] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T and Hassabis D 2017 Mastering the game of Go without human knowledge Nature 550, 354
[29] Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T, Lillicrap T, Simonyan K, and Hassabis D 2018 A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play Science 362 1140
[30] Dong D, Chen C, Li H and Tarn T J 2008 Quantum Reinforcement Learning IEEE Trans. Syst. Man Cybern. B Cybern. 38 1207
[31] Paparo G D, Dunjko V, Makmal A, Martin-Delgado M A and Briegel H J 2014 Quantum Speedup for Active Learning Agents Phys. Rev. X 4 031002
[32] Lamata L 2017 Basic protocols in quantum reinforcement learning with superconducting circuits Sci. Rep. 7 1609
[33] Cárdenas-López F A, Lamata L, Retamal J C, and Solano E 2018 Multiqubit and multilevel quantum reinforcement learning with quantum technologies PLOS ONE 13 e0200455
[34] Crawford D, Levit A, Ghadermarzy N, Oberoi J S and Ronagh P 2019 arXiv:1612.05695quant-ph
[35] Abrams D S and Lloyd S 1999 Quantum Algorithm Providing Exponential Speed Increase for Finding Eigenvalues and Eigenvectors Phys. Rev. Lett. 83 5162
[36] Jaksch P and Papageorgiou A 2003 Eigenvector Approximation Leading to Exponential Speedup of Quantum Eigenvalue Calculation Phys. Rev. Lett. 91 257902
[37] Wang H, Wu L-A, Liu Y-X and Nori F 2010 Measurement-based quantum phase estimation algorithm for finding eigenvalues of non-unitary matrices Phys. Rev. A 82 062303
[38] Wang H 2016 Quantum algorithm for obtaining the eigenstates of a physical system Phys. Rev A 93 052334
[39] Wecker D, Bauer B, Clark B K, Hastings M B and Troyer M 2014 Gate-count estimates for performing quantum chemistry on small quantum computers Phys. Rev. A 90 022305
[40] Peruzzo A, McClean J, Shadbolt P, Yung M-H, Zhou X-Q, Love P J, Aspuru-Guzik A and O’Brien J L 2014 A variational eigenvalue solver on a photonic quantum processor Nat. Comm. 5, 4213
[41] Yung M-H, Casanova J, Mezzacapo A, McClean J, Lamata L, Aspuru-Guzik A and Solano E 2014 From transistor to trapped-ion computers for quantum chemistry Sci. Rep. 4 3589
[42] McClean J R, Romero J, Babbush R, and Aspuru-Guzik A 2016 The theory of variational hybrid quantum-classical algorithms New J. Phys. 18 023023
[43] O’Malley P J J, Babbush R, Kivlichan I D, Romero J, McClean J R, Barends R, Kelly J, Roushan P, Tranter A, Ding N, Campbell B, Chen Y, Chen Z, Chiaro B, Dunsworth A, Fowler A G, Jeffrey E, Lucero E, Megrant A, Mutus J Y, Neeley M, Neill C, Quintana C, Sank D, Vainsencher A, Wenner J, White T C, Coveney P V, Love P J, Neven H, Aspuru-Guzik A and Martinis J M 2016 Scalable Quantum Simulation of Molecular Energies Phys. Rev. X 6 031007
[44] Kandala A, Mezzacapo A, Temme K, Takita M, Brink M, Chow J M and Gambetta J M 2017 Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets Nature 549 242
[45] Hempel C, Maier C, Romero J, McClean J, Monz T, Shen H, Jurcevic P, Lanyon B P, Love P, Babbush R, Aspuru-Guzik A, Blatt R and Roos C F 2018 Quantum Chemistry Calculations on a Trapped-Ion Quantum Simulator Phys. Rev. X 8 031022
[46] Kokail C, Maier C, van Bijnen R, Brydges T, Joshi M K, Jurcevic P, Muschik C A, Silvi P, Blatt R, Roos C F and Zoller P 2019 Self-verifying variational quantum simulation of lattice models Nature 569 355
[47] Code repository at https://github.com/PanchoAlbarran/EigenSolver
[48] Alvarez-Rodriguez U, Sanz M, Lamata L and Solano E 2016 Artificial Life in Quantum Technologies Sci. Rep. 6 20956
[49] Alvarez-Rodriguez U, Sanz M, Lamata L and Solano E 2018 Quantum Artificial Life in an IBM Quantum Computer Sci. Rep. 8 14793
[50] Albarrán-Arriagada F, Retamal J C, Solano E and Lamata L 2018 Measurement-based adaptation protocol with quantum reinforcement learning Phys. Rev. A 98 042315
[51] Yu S, Albarrán-Arriagada F, Retamal J C, Wang Y-T, Liu W, Ke Z-J, Meng Y, Li Z-P, Tang J-S, Solano E, Lamata L, Li C-F and Guo G-C 2019 Reconstruction of a Photonic Qubit State with Reinforcement Learning Adv. Quantum Technol. 2, 1800074
[52] Olivares-Sánchez J, Casanova J, Solano E and Lamata L 2018 arXiv:1811.07594 [quant-ph]
[53] Nielsen M A and Chuang I L 2010 Quantum Computation and Quantum Information (Cambridge: Cambridge University Press)

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Adcock J C, Allen E, Day M, Frick S, Hinchliff J, Johnson M, Morley-Short S, Pallister S, Price A B and Stanisic S 2015 ar Xiv :1512.02900 [quant-ph]
2[2] Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N and Lloyd S 2017 Quantum machine learning Nature 549 074001 · doi ↗
3[3] Dunjko V, Taylor J M and Briegel H J 2016 Quantum-Enhanced Machine Learning Phys. Rev. Lett. 117 130501
4[4] Dunjko V and Briegel H J 2018 Machine learning & artificial intelligence in the quantum domain: a review of recent progress Rep. Prog. Phys. 81 195 · doi ↗
5[5] Hentschel A and Sanders B C, 2010 Machine Learning for Precise Quantum Measurement Phys. Rev. Lett. 104 063603
6[6] Hentschel A and Sanders B C 2011 Efficient Algorithm for Optimizing Adaptive Quantum Metrology Processes Phys. Rev. Lett. 107 233601
7[7] Torlai G, Mazzola G, Carrasquilla J, Troyer M, Melko R and Carleo G 2018 Neural-network quantum state tomography Nat. Phys. 14 447
8[8] Rocchetto A, Aaronson S, Severini S, Carvacho G, Poderini D, Agresti I, Bentivegna M and Sciarrino F 2019 Experimental learning of quantum states Sci. Adv. 5 eaau 1946

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Reinforcement learning for semi-autonomous approximate quantum eigensolver

Abstract

1 Introduction

2 Quantum eigensolver protocol

2.1 Single-qubit case

2.2 Single-qudit case

2.3 Multiqubit case

3 Numerical results

3.1 Single-qubit case

3.2 Two-qubit case

4 Conclusions

Data availability statement

Appendix A Explicit form of θˉ(k)\bar{\theta}^{(k)}θˉ(k) and ϕˉ(k)\bar{\phi}^{(k)}ϕˉ​(k)

Appendix B Explicit form of Δθ(k)\Delta_{\theta}^{(k)}Δθ(k)​ and Δφ(k)\Delta_{\varphi}^{(k)}Δφ(k)​

References

Appendix A Explicit form of $\bar{\theta}^{(k)}$ and $\bar{\phi}^{(k)}$

Appendix B Explicit form of $\Delta_{\theta}^{(k)}$ and $\Delta_{\varphi}^{(k)}$