Quantum-inspired memory-enhanced stochastic algorithms

John Realpe-G\'omez; Nathan Killoran

arXiv:1906.00263·quant-ph·June 4, 2019

Quantum-inspired memory-enhanced stochastic algorithms

John Realpe-G\'omez, Nathan Killoran

PDF

Open Access

TL;DR

This paper introduces quantum-inspired classical algorithms that significantly reduce memory requirements for simulating stochastic models, potentially enhancing classical supercomputing and impacting quantum supremacy benchmarks.

Contribution

It demonstrates that quantum memory advantages can be approximated classically, enabling more efficient simulations on existing high-performance computers.

Findings

01

Quantum-inspired algorithms require less memory than traditional methods.

02

Classical implementations can approximate quantum algorithms for stochastic models.

03

Potential to improve classical simulation efficiency and influence quantum supremacy efforts.

Abstract

Stochastic models are highly relevant tools in science, engineering, and society. Recent work suggests emerging quantum computing technologies can substantially decrease the memory requirements for simulating stochastic models. Here we show that some of these recent quantum memory-enhanced algorithms can be either implemented or approximated classically. In other words, we show that it is possible to develop quantum-inspired classical algorithms that require much less memory than the best classical algorithms known to date. Being classical, such algorithms could be implemented in state-of-the-art high-performance computers, which could potentially enhance the study of large-scale complex systems. Furthermore, since memory is the main bottleneck limiting the performance of classical supercomputers in one of the most promising avenues to demonstrate quantum 'supremacy', we expect…

Equations172

D_{c} = lo g_{2} ∣ S ∣,

D_{c} = lo g_{2} ∣ S ∣,

H_{c} = - j \in S \sum π_{j} lo g_{2} π_{j},

H_{c} = - j \in S \sum π_{j} lo g_{2} π_{j},

x \leftarrow \sim_{ϵ} y \leftarrow \Leftrightarrow P (X \to ∣ x \leftarrow) = P (X \to ∣ y \leftarrow),

x \leftarrow \sim_{ϵ} y \leftarrow \Leftrightarrow P (X \to ∣ x \leftarrow) = P (X \to ∣ y \leftarrow),

P (x, j ∣ i) = P (x ∣ i) δ_{j, f (i, x)},

P (x, j ∣ i) = P (x ∣ i) δ_{j, f (i, x)},

∣ ξ_{i} ⟩ = x \in A, j \in S \sum P (x, j ∣ i) ∣ j ⟩ ∣ x ⟩,

∣ ξ_{i} ⟩ = x \in A, j \in S \sum P (x, j ∣ i) ∣ j ⟩ ∣ x ⟩,

U ∣ ξ_{i} ⟩ ∣ 0 ⟩ = x \in A \sum P (x ∣ i) ξ_{f (i, x)} ⟩ ∣ x ⟩ .

U ∣ ξ_{i} ⟩ ∣ 0 ⟩ = x \in A \sum P (x ∣ i) ξ_{f (i, x)} ⟩ ∣ x ⟩ .

ρ = i \in S \sum π_{i} ∣ ξ_{i} ⟩ ⟨ ξ_{i} ∣,

ρ = i \in S \sum π_{i} ∣ ξ_{i} ⟩ ⟨ ξ_{i} ∣,

D_{q} = lo g_{2} [rank (ρ)],

D_{q} = lo g_{2} [rank (ρ)],

S_{q} = - Tr ρ lo g_{2} ρ .

S_{q} = - Tr ρ lo g_{2} ρ .

P (1∣0) = P (0∣1)

P (1∣0) = P (0∣1)

P (0∣0) = P (1∣1)

∣ ξ_{0} ⟩

∣ ξ_{0} ⟩

∣ ξ_{1} ⟩

U_{x} = (1 - x x # #),

U_{x} = (1 - x x # #),

CNOT^{(1, 2)} = ∣ 0 ⟩ ⟨ 0 ∣ \otimes 1 I + ∣ 1 ⟩ ⟨ 1 ∣ \otimes X,

CNOT^{(1, 2)} = ∣ 0 ⟩ ⟨ 0 ∣ \otimes 1 I + ∣ 1 ⟩ ⟨ 1 ∣ \otimes X,

X = (0110) .

X = (0110) .

∣ χ_{j} ⟩ = CNOT^{(1, 2)} ∣ ξ_{j} ⟩ ∣ ξ_{0} ⟩,

∣ χ_{j} ⟩ = CNOT^{(1, 2)} ∣ ξ_{j} ⟩ ∣ ξ_{0} ⟩,

∣ χ_{0} ⟩

∣ χ_{0} ⟩

∣ χ_{1} ⟩

ρ = \frac{1}{2} ∣ ξ_{0} ⟩ ⟨ ξ_{0} ∣ + \frac{1}{2} ∣ ξ_{1} ⟩ ⟨ ξ_{1} ∣ = λ_{+} ∣ + ⟩ ⟨ + ∣ + λ_{-} ∣ - ⟩ ⟨ - ∣ .

ρ = \frac{1}{2} ∣ ξ_{0} ⟩ ⟨ ξ_{0} ∣ + \frac{1}{2} ∣ ξ_{1} ⟩ ⟨ ξ_{1} ∣ = λ_{+} ∣ + ⟩ ⟨ + ∣ + λ_{-} ∣ - ⟩ ⟨ - ∣ .

∣ + ⟩ = \frac{1}{2} (11), ∣ - ⟩ = \frac{1}{2} (1 - 1),

∣ + ⟩ = \frac{1}{2} (11), ∣ - ⟩ = \frac{1}{2} (1 - 1),

S_{q} (λ) = - λ lo g_{2} λ - (1 - λ) lo g_{2} (1 - λ) \leq lo g_{2} 2 = 1,

S_{q} (λ) = - λ lo g_{2} λ - (1 - λ) lo g_{2} (1 - λ) \leq lo g_{2} 2 = 1,

∣ ξ_{0} ⟩

∣ ξ_{0} ⟩

∣ ξ_{1} ⟩

∣ ξ_{2} ⟩

∣ 0 ⟩

∣ 0 ⟩

∣ 1 ⟩

\neg CU_{p}^{(1, 3)}

\neg CU_{p}^{(1, 3)}

CU_{1 - q}^{(1, 2)}

CNOT^{(3, 2)}

U = CNOT^{(3, 2)} CU_{1 - q}^{(1, 2)} \neg CU_{p}^{(1, 3)},

U = CNOT^{(3, 2)} CU_{1 - q}^{(1, 2)} \neg CU_{p}^{(1, 3)},

U ∣ ξ_{0} ⟩ ∣ 0 ⟩ ∣ 0 ⟩

U ∣ ξ_{0} ⟩ ∣ 0 ⟩ ∣ 0 ⟩

U ∣ ξ_{1} ⟩ ∣ 0 ⟩ ∣ 0 ⟩

U ∣ ξ_{2} ⟩ ∣ 0 ⟩ ∣ 0 ⟩

(1 - p p) = \frac{1}{2} (11) + \frac{1 - 2 p}{2} (1 - 1) .

(1 - p p) = \frac{1}{2} (11) + \frac{1 - 2 p}{2} (1 - 1) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Computing Algorithms and Architecture · Parallel Computing and Optimization Techniques · Quantum Information and Cryptography

Full text

Quantum-inspired memory-enhanced stochastic algorithms

John Realpe-Gómez

[email protected]

Instituto de Matemáticas Aplicadas, Universidad de Cartagena, Bolívar 130001, Colombia

Nathan Killoran

Xanadu, Toronto, Canada

Abstract

Stochastic models are highly relevant tools in science, engineering, and society. Recent work suggests emerging quantum computing technologies can substantially decrease the memory requirements for simulating stochastic models. Here we show that some of these recent quantum memory-enhanced algorithms can be either implemented or approximated classically. In other words, we show that it is possible to develop quantum-inspired classical algorithms that require much less memory than the best classical algorithms known to date. Being classical, such algorithms could be implemented in state-of-the-art high-performance computers, which could potentially enhance the study of large-scale complex systems. Furthermore, since memory is the main bottleneck limiting the performance of classical supercomputers in one of the most promising avenues to demonstrate quantum ‘supremacy’, we expect adaptations of these ideas may potentially further raise the bar for near-term quantum computers to reach such a milestone.

I Introduction

From the prediction and understanding of financial markets bouchaud2018trades ; bisias2012survey ; farmer2009economy or the intricate relationships among ecosystems’ resilience, climate change, and human activity stern2016economics ; cai2015environmental ; franzke2015stochastic , to the design and operation of the artificial intelligence architectures lake2015human ; ghahramani2015probabilistic ; goodfellow2016deep ; murphy2012machine that pervade our lives, stochastic models and the corresponding algorithms to implement them are indispensable. Optimizing the computational resources, such as speed and memory, needed for running stochastic algorithms like Markov chain Monte Carlo, is crucial for keeping up with the fast-growing and increasingly complex problems our society faces.

With the emerging commercialization of quantum computing technologies and the race to demonstrate quantum ‘supremacy’ mohseni2017commercialize ; boixo2018characterizing , there is much interest in understanding what aspects of information processing quantum computers can do better than their classical counterparts. While much work has focused on potential quantum speedup biswas2017nasa , there is growing interest in potentially extreme memory reductions that quantum protocols can provide gu2012quantum ; thompson2018causal ; aghamohammadi2018extreme ; ghafari2018single ; ghafari2019interfering ; palsson2017experimentally ; binder2018practical ; liu2018optimal ; thompson2017using . Indeed, both theoretical gu2012quantum ; thompson2018causal ; aghamohammadi2018extreme ; binder2018practical ; liu2018optimal ; thompson2017using and experimental ghafari2018single ; ghafari2019interfering ; palsson2017experimentally work suggests that quantum protocols can generate samples from stochastic processes using much less memory than the best classical counterparts known to date, i.e., the so-called (classical) $\epsilon$ -machines crutchfield1989inferring ; shalizi2001computational . These $\epsilon$ -machines rely on the minimum ‘deterministic’ information about the past of a stochastic process necessary to statistically predict its future—known as causal states.

For instance, imagine a coin inside a box that is regularly perturbed. Each perturbation of the box can flip the coin with a probability $p$ , irrespective of the current state of the coin (see Fig. 1). If $p\neq\tfrac{1}{2}$ , all we need to know is the previous state of the coin, i.e., the two causal states heads or tails, to correctly predict the probabilities of all future trajectories; therefore, we need to save in memory one bit of information. The information saved is ‘deterministic’ in the sense that it is either heads or tails and not a classical probabilistic mixture of the two. In contrast, quantum $\epsilon$ -machines can operate using less memory by encoding information about the past on quantum superpositions, the so-called quantum causal states (see Fig. 2). If $p=\tfrac{1}{2}$ the process is equivalent to tossing a coin at random at each moment, so the amount of memory required by a classical $\epsilon$ -machine discontinuously jumps to zero. The amount of memory required by the quantum $\epsilon$ -machine also goes to zero, though continuously, so it cannot do better in this case (see Fig. 3).

Such quantum-enabled memory reductions are reflected in two possible ways. First, in the sequential generation of a single stochastic trajectory due to a smaller state space of the quantum $\epsilon$ -machine, e.g., an $\epsilon$ -machine that needs a trit of memory could be simulated by a quantum $\epsilon$ -machine that needs to keep in memory only a qubit thompson2018causal ; ghafari2018single —this is referred to as topological memory reduction. Second, in the parallel generation of a large number of stochastic trajectories, or samples, due to a smaller (von Neumann) entropy of the quantum $\epsilon$ -machine at the stationary state gu2012quantum ; ghafari2019interfering ; palsson2017experimentally —this is referred to as statistical memory reduction. The quantum-enabled reduction of topological memory relies on finding a representation of the quantum causal states in terms of a quantum system of smaller dimension. In contrast, following the Schumacher quantum coding theorem schumacher1995quantum , the quantum-enabled reduction of statistical memory relies on finding a suitable quantum coding of a large number of sample quantum causal states, distributed according to the stationary state of the stochastic process.

Except for a very recent work liu2018optimal , all quantum $\epsilon$ -machines studied to date encode information only on the amplitude of quantum states, i.e., no phase information is used aghamohammadi2018extreme ; ghafari2018single ; ghafari2019interfering ; palsson2017experimentally ; thompson2018causal ; binder2018practical ; gu2012quantum . Such ‘amplitude-encoded’ quantum $\epsilon$ -machines, however, can lead to ‘extreme’ memory reductions aghamohammadi2018extreme , i.e., to situations where the memory required by the classical $\epsilon$ -machine diverges while that required by the corresponding quantum $\epsilon$ -machine remains finite.

But, if the information encoded in a probability distribution and that encoded in its square root is the same, why is there a difference in the memory requirements of classical and quantum $\epsilon$ -machines? Indeed, an early algorithm for ‘quantum deep learning’ wiebe2014quantum that only exploited the amplitude of quantum states was later shown to be implementable classically wiebe2015quantum . Nowadays, there is growing interest in such ‘quantum-inspired’ algorithms tang2018quantum ; gilyen2018quantum ; hen2018quantum ; arrazola2019quantum . This aligns with recent work realpe2017modeling ; realpe2018cognitive suggesting that classical message-passing algorithms can be written in a way mathematically analogous to quantum dynamics in imaginary-time, i.e., by changing time $t$ into $-it$ in the Schrödinger equation. In particular, square roots of probabilities arise naturally in such a classical setting. But when the phase degree of freedom is never used, the imaginary unit does not play any role, and it is natural to expect that both real-time and imaginary-time quantum dynamics can become similar.

Here we provide evidence that in some cases it is possible to classically implement such ‘amplitude-encoded’ quantum $\epsilon$ -machines. More precisely, we show that it is sometimes possible to implement classical protocols that have the same memory reductions as amplitude-encoded quantum $\epsilon$ -machines. We do so by saving information about the past of a stochastic process using probabilistic mixtures, or stochastic causal states, rather than the deterministic causal states of standard $\epsilon$ -machines. We also show that it is sometimes possible to build classical algorithms that, by exploiting an operational interpretation of negative numbers arising in a decomposition of probability distributions, introduced in Refs. realpe2017modeling ; realpe2018cognitive , can generate stochastic trajectories while using much less memory than the corresponding classical $\epsilon$ -machines. Interestingly, in the case of a coin being regularly flipped with probability $p$ mentioned above, the memory requirements approach zero continuously as $p\to\tfrac{1}{2}$ , much as the quantum $\epsilon$ -machine does, instead of jumping discontinuously to zero as the classical $\epsilon$ -machine does. Such algorithms, though, do not attain the same memory reductions of amplitude-encoded quantum $\epsilon$ -machines, so there is room for improvement. In particular, they do not exploit square roots of probabilities. We discuss how belief propagation can pave the way for expoiting such square roots.

Our results therefore show that it is possible to develop quantum-inspired classical stochastic algorithms that require much less memory than the best classical stochastic algorithms known to date. Now, memory is the main bottleneck limiting the performance of classical supercomputers in one of the most promising avenues to demonstrate quantum ‘supremacy’ in the near term boixo2018characterizing . So we expect adaptations of these ideas may potentially further raise the bar for quantum computing technologies to reach such a milestone.

The rest of this paper is organized as follows. In Sec. II we review the main concepts related to $\epsilon$ -machines (Sec. II.1) as well as their quantum counterparts (Sec. II.2), and present two specific examples that have been recently demonstrated experimentally ghafari2019interfering ; palsson2017experimentally ; ghafari2018single , which we will revisit in the next sections. In Sec. III we identify some features implicit in quantum $\epsilon$ -machines and discuss how these can be implemented classically. Based on these, we introduce in Sec. III.2 two classical algorithms for the two examples mentioned above that require less memory to operate than the best algorithms known to date, i.e., their corresponding $\epsilon$ -machines. One of these algorithms reaches the same memory gains as the corresponding quantum $\epsilon$ -machine, while the other does not. In Sec. III.3 we discuss the extension of the main ideas developed previously to build general quantum-inspired memory-enhanced algorithms. In particular, in Sec. III.3.2 we introduce a general quantum-inspired classical algorithm that can generate samples with much less statistical memory than the best classical algorithms known to date. However, this algorithm does not reach, in general, the same memory savings of the corresponding quantum protocols. In Appendix A we present a detailed example. In Sec. III.3.4 we briefly discuss how recent work realpe2017modeling ; realpe2018cognitive , showing that belief propagation on chain- and cycle-like graphical models can be formulated as quantum-like dynamics, could serve as a general framework to exploit square roots of probabilities. In Appendix B we present a detailed discussion of these ideas and introduce two graphical models whose dynamics is mathematically analogous to the two quantum $\epsilon$ -machines discussed as examples in Sec. II.2. Finally, in Sec. IV we present the conclusions of this work and put the ideas introduced here in a broader perspective.

II Framework

II.1 Classical stochastic algorithms

Consider a system whose state at time step $t\in\mathbb{Z}$ can be described by a stochastic variable $X_{t}$ which takes a value $x_{t}\in\mathcal{A}$ in a certain alphabet $\mathcal{A}$ . The system’s dynamics can then be described by a sequence of stochastic variables $\overset{\leftrightarrow}{X}=\overset{\leftarrow}{X}\overset{\rightarrow}{X}$ . Here $\overset{\leftarrow}{X}=\dotsc X_{-2}X_{-1}$ and $\overset{\rightarrow}{X}=X_{0}X_{1}\dotsc$ are the sequences describing, respectively, the past and future dynamics of the system. The system’s dynamical law is specified by a probability distribution $P(\overset{\leftarrow}{X},\overset{\rightarrow}{X})$ . Sampling from a given stochastic process amounts to generating a sequence of variables from the corresponding dynamical law $P(\overset{\leftarrow}{X},\overset{\rightarrow}{X})$ . A naïve way to sample a future sequence $\overset{\rightarrow}{X}$ , given a realization of the past sequence $\overset{\leftarrow}{x}$ , may require infinite memory; for instance, if we need to store the whole sequence $\overset{\leftarrow}{x}$ .

More compact representations of stochastic processes that require less memory resources are therefore highly desirable. Sometimes the probability distribution $P(\overset{\leftarrow}{X},\overset{\rightarrow}{X})$ can be factorized into simpler probability distributions. For instance, in the case of stationary $m$ th-order Markov chains, the whole dynamics can be generated by a single conditional probability distribution $p(x_{t}|x_{t-m},\dotsc,x_{t-1})$ that yields the probability that the state of the system at time step $t$ is $x_{t}$ , given that the previous $m$ states the system visited were $x_{t-m},\dotsc,x_{t-1}$ . A very common example with $m=1$ is the Markov chain Monte Carlo algorithm.

However, in some cases the order of the Markov chain, $m$ , can be prohibitively large or even infinity (see e.g., Fig. 1 in Ref. thompson2018causal ). Fortunately, such long-range temporal correlations can sometimes be more compactly captured via hidden variables. A hidden Markov model (HMM) is characterized by a set of hidden variables or states $\mathcal{S}$ , an alphabet $\mathcal{A}$ , and the probability $P(x,j|i)$ that if the HMM is in state $i\in\mathcal{S}$ it emits output $x\in\mathcal{A}$ and transitions to state $j\in\mathcal{S}$ (see Fig. 1).

The most direct way to characterize the memory requirements of a HMM is perhaps by the dimension of its state space. The topological memory

[TABLE]

corresponds to the number of bits necessary to encode the $|\mathcal{S}|$ states in which a HMM can be. The topological memory characterizes the memory required for sequential sampling, i.e., for generating a single sample trajectory.

A perhaps more subtle way to characterize the memory requirements of a HMM is by the amount of information actually encoded in the states of a HMM, once it has reached the stationary state $\pi$ . The statistical memory

[TABLE]

corresponds to the entropy of the HMM’s stationary sate $\pi$ . The statistical memory characterizes the memory required for parallel sampling, i.e., for the simultaneous generation of $M\gg 1$ samples at the stationary state. This observation is based on Shannon’s source coding theorem which states that $M\gg 1$ samples distributed according to $\pi$ can be encoded into about $MH_{c}$ bits. The statistical memory of a HMM with state space $\mathcal{S}$ can be considered as the topological memory of a new HMM with state space $\mathcal{S}^{M}$ . Such a new HMM operates at the stationary state and transitions $\mathbf{j}\to\mathbf{j}^{\prime}$ , with $\mathbf{j},\mathbf{j}^{\prime}\in\mathcal{S}^{M}$ , correspond to the independent transitions $j_{\ell}\to j_{\ell}^{\prime}$ , with $\ell=1\dotsc,M$ , of the $M$ samples of the original HMM. According to Shannon source coding theorem, at the stationary state $\pi$ we can then find an encoding of the state space $\mathcal{S}^{M}$ such that the set of states $\mathcal{S}^{\ast}$ defined on the code space is of size $\sim 2^{MH_{c}}$ .

The best HMMs known to date, in the sense that they require the smallest topological and statistical memory, are called $\epsilon$ -machines. The states of an $\epsilon$ -machine are given by a mapping $\epsilon$ that encodes an equivalence relation

[TABLE]

with $\epsilon(x)=\epsilon(y)=j\in\mathcal{C}$ , where $\mathcal{C}$ is the set of states of the $\epsilon$ -machine, which we will refer to as (deterministic) causal states. An important property of any $\epsilon$ -machine, called unifilarity and implied by Eq. (3), is that the causal state $j$ to which it transitions at any given time is completely determined by the output $x$ emitted in such a transition and the state $i$ from which the transition takes place. More precisely,

[TABLE]

where $\delta_{j,k}$ is the Kronecker delta function and ${f:\mathcal{S}\times\mathcal{A}\to\mathcal{S}}$ is a deterministic function that returns the causal state $f(i,x)\in\mathcal{S}$ to which the HMM transitions from state $i\in\mathcal{S}$ when it emits output $x\in\mathcal{A}$ .

In general, however, there is room for improvement since $\epsilon$ -machines do not always reach the minimum memory requirement, which is given by the mutual information between past and future ${I(\overset{\leftarrow}{X}:\overset{\rightarrow}{X})}$ . We now know gu2012quantum ; binder2018practical ; liu2018optimal ; aghamohammadi2018extreme ; ghafari2018single ; thompson2018causal quntum models can do better in such cases.

II.2 Quantum-enhanced stochastic algorithms

II.2.1 General considerations

We now discuss recent work gu2012quantum ; binder2018practical ; liu2018optimal ; aghamohammadi2018extreme ; ghafari2018single ; thompson2018causal on quantum protocols for generating samples from a given classical $\epsilon$ -machine that can operate with less memory requirements, the so-called quantum $\epsilon$ -machines. The central idea is to define quantum causal states as

[TABLE]

where $\{\left|j\right\rangle\}_{j\in\mathcal{S}}$ is an orthonormal basis of a quantum system that represents the deterministic causal states $j\in\mathcal{S}$ of the classical $\epsilon$ -machine, and $\{\left|x\right\rangle\}_{x\in\mathcal{A}}$ is an orthonormal basis of another quantum system that represents the corresponding outputs $x\in\mathcal{A}$ .

It is always possible binder2018practical to devise a unitary quantum operator $U$ , such that

[TABLE]

If we measure the second system in Eq. (6) in the basis $\{\left|x\right\rangle\}_{x\in\mathcal{A}}$ , that system will output $x\in\mathcal{A}$ with the correct probability $P(x|i)$ and the first system in Eq. (6) will transition to the correct next quantum causal state $\left|\xi_{j^{\ast}}\right\rangle$ , with $j^{\ast}={f(i,x)}$ , so the protocol can be applied iteratively. Importantly, except for a very recent work liu2018optimal which shows that adding phases to Eq. (5) can provide further memory reductions, all work to date has only considered amplitude-encoded quantum causal states like those in Eq. (5). In this work we will exclusively focus on the latter which, while being less general, can still provide extreme memory reductions, i.e., situations where the memory required for a classical $\epsilon$ -machine to operate diverges while that required by the corresponding amplitude-encoded quantum $\epsilon$ -machine remains finite aghamohammadi2018extreme .

At the stationary state $\pi$ , the quantum causal state $\left|\xi_{i}\right\rangle$ appears with probability $\pi_{i}$ . So the state of the quantum system can be represented by a density matrix

[TABLE]

whose rank yields the quantum topological memory

[TABLE]

and whose von Neuman entropy yields the quantum statistical memory

[TABLE]

Unlike deterministic casual states $\{\left|j\right\rangle_{A}\}_{j\in\mathcal{S}}$ that are always orthogonal, the quantum causal states introduced in Eq. (5) can be non-orthogonal, i.e., we can have $\left\langle\xi_{i}|\xi_{j}\right\rangle\neq 0$ for $i\neq j$ and therefore $\rho$ can be non-diagonal. So the quantum statistical memory $S_{q}$ can be strictly lower than the classical one $H_{c}$ . This implies that if we want to generate $M$ independent samples in parallel we can encode the corresponding density matrix $\bigotimes_{m=1}^{M}\rho$ with about $MS(\rho)\leq MH_{c}$ qubits describing the typical subspace schumacher1995quantum ; nielsen2002quantum .

II.2.2 Examples

*1. Symmetrically perturbed coin process: * Figures 1a and 2a show an example recently investigated in Ref. gu2012quantum and experimentally demonstrated in Refs. ghafari2019interfering ; palsson2017experimentally . This simple two-state Markov chain can be interpreted as the $\epsilon$ -machine of a coin in a box undergoing regular perturbations that induce the coin to flip with probability $p$ at each time step gu2012quantum . In this case we have $f(i,x)=x\in\{0,1\}$ and the transition probabilities are given by

[TABLE]

The best classical algorithms known to-date need to save in memory one bit, encoding whether the previous state was 0 or 1 (see Fig. 3).

Figure 2a shows a possible implementation of a quantum $\epsilon$ -machine that can reduce the memory requirements (see Fig. 3) by relying on quantum causal states ghafari2019interfering ; palsson2017experimentally ; gu2012quantum (cf. Eq. (5))

[TABLE]

These quantum causal states, Eqs. (12) and (13), can be prepared from the state $\left|0\right\rangle$ via a unitary

[TABLE]

with $x=p$ and $x=1-p$ , respectively. Here the undetermined entries are irrelevant for the protocol; they are chosen so that the operations are unitary.

The quantum protocol starts with the current causal state $\left|\xi_{j}\right\rangle$ , with $j\in\{0,1\}$ , adds an ancilla qubit in state $\left|\xi_{0}\right\rangle$ , and then applies to the combined system a gate (see Fig. 2a)

[TABLE]

where ${\rm 1\!\!I}$ is the identity matrix and

[TABLE]

This yields a new combined state

[TABLE]

where

[TABLE]

By measuring the first qubit in the computational basis we get the desired statistics and the corresponding transition of the second qubit to the correct quantum causal state, so the protocol can be applied iteratively ghafari2019interfering ; palsson2017experimentally .

As the stationary state of the symmetrically perturbed coin process (see Fig. 1a) is ${\pi=(\tfrac{1}{2},\tfrac{1}{2})}$ , the density matrix associated to the quntum $\epsilon$ -machine is (see Eq. (7))

[TABLE]

Here $\lambda_{+}=\lambda\equiv\tfrac{1}{2}+\sqrt{p(1-p)}$ and $\lambda_{-}=1-\lambda$ are the largest and smallest eigenvalues, respectively, and

[TABLE]

are the corresponding eigenvectors.

The quantum statistical memory is given by

[TABLE]

which is strictly smaller than the memory required for the classical $\epsilon$ -machine (see Fig. 3), except when $p=0$ and $p=1$ where both $\epsilon$ -machines require one bit, or $p=\tfrac{1}{2}$ where both $\epsilon$ -machines require zero bits. So, if represented in terms of $\left|+\right\rangle$ and $\left|-\right\rangle$ , the quantum $\epsilon$ -machine can generate $M$ independent stochastic trajectories while keeping in memory only $MS_{q}(\lambda)$ qubits.

2. Post-processed perturbed coin process: Figures 1b and 2b show an example, recently investigated in Ref. thompson2018causal and experimentally demonstrated in Ref. ghafari2018single , of a three-state Markov chain. This can be interpreted as the $\epsilon$ -machine of a suitably post-processed perturbed coin process. See Refs. thompson2018causal ; ghafari2018single for details on such an interpretation. Here we are only interested in how quantum protocols can generate samples from this Markov chain using a smaller topological memory than the best known classical algorithms. Indeed, there are quantum $\epsilon$ -machines thompson2018causal ; ghafari2018single that can simulate this Markov chain while keeping in memory just a single qubit, instead of a qutrit or two qubits as it may appear necessary for simulating a three-state system (see Fig. 2b).

Following Sec. II.2, we can introduce an orthonormal basis $\{\left|0^{\prime}\right\rangle,\left|1^{\prime}\right\rangle,\left|2^{\prime}\right\rangle\}$ (notice the primes) representing the deterministic causal states of the classical $\epsilon$ -machine (see Eq. (3) and Fig. 1b). We can then define quantum causal states in terms of these as follows (see Eq. (5) and Fig. 2b):

[TABLE]

Here we have also introduced a change of representation in terms of a new single qubit basis $\{\left|0\right\rangle,\left|1\right\rangle\}$ (with no primes), where thompson2018causal ; ghafari2018single

[TABLE]

Equations (23)-(25) show the three quantum causal states $\left|\xi_{i}\right\rangle$ , for $i=0,1,2$ , can indeed be written in terms of one single qubit, as stated above.

A quantum protocol thompson2018causal (see Fig. 2b) to sequentially generate samples from the post-processed perturbed coin process in Fig. 1b starts with the current causal state $\left|\xi_{j}\right\rangle$ , with $j\in\{0,1,2\}$ , represented in terms of the qubit basis $\{\left|0\right\rangle,\left|1\right\rangle\}$ , along with two ancillary qubits, both in state $\left|0\right\rangle$ . The following unitary operations are then succesively applied to the three-qubit state $\left|\xi_{i}\right\rangle\left|0\right\rangle\left|0\right\rangle$ (see Fig. 2b):

[TABLE]

where $U_{p}$ and $U_{1-q}$ are obtained by setting $x=p$ and $x=1-q$ , respectively, in Eq. (14). We emphasize that the unspecified entries $\#$ in Eq. (14) are not relevant for this protocol and are chosen such that $U_{p}$ and $U_{1-q}$ are unitary.

Using the full unitary,

[TABLE]

built from these operators we obtain

[TABLE]

A measurement of the first and third qubits in the qubit basis $\{\left|0\right\rangle,\left|1\right\rangle\}$ (see Eqs. (26) and (27)) is then performed. Let $y_{1}\in\{0,1\}$ and $y_{3}\in\{0,1\}$ denote the corresponding values obtained. The probability of observing both $y_{1}=1$ and $y_{3}=1$ is zero. The combination $x=y_{1}+2y_{3}\in\{0,1,2\}$ yields the three possible outputs of the post-processed perturbed coin process (see Fig. 1b) with the correct probabilities. The second qubit transitions to the correct quantum causal state (see Eqs. (23)-(25)), so the protocol can be applied iteratively.

III Results

III.1 Quantum-inspired memory-enhanced sampling

III.1.1 General considerations

The quantum protocols described in Sec. II.2 were based on the amplitude encoding of the transition probabilities of the corresponding Markov chains, Eq. (5) (see Fig. 1). However, Refs. realpe2017modeling ; realpe2018cognitive show that the quantum dynamics of such phaseless quantum states can be mathematically reproduced via classical belief propagation, at least in the examples studied here (see Sec. III.3.4 and Appendix B)—this only holds at the time scale set by the gates, not for any arbitrarily short time scale. It is then natural to ask whether such quantum memory-enhanced algorithms could be implemented classically. While belief propagation seems to be a strong candidate, there are some caveats that we expect can be resolved in the near future, as we will discuss in Appendix B.

We will therefore introduce a different approach here. We will show that by exploiting some classical features implicit in the quantum protocols described in Sec. II.2, we can design classical algorithms more memory-efficient than the best classical algorithms known to date, i.e., $\epsilon$ -machines.

For simplicity, we will first discuss in Sec. III.2 the two examples we have been dealing with (see Figs. 1 and 2) and afterwards, in Sec. III.3, we discuss how to develop general quantum-inspired memory-enhanced stochastic algorithms. In Sec. III.2, we will first introduce a classical algorithm for the symmetrically perturbed coin process (see Fig. 1a) that requires much less statistical memory than the corresponding $\epsilon$ -machine, though it does not reach the same memory gains of the corresponding quantum protocol. Interestingly, as in the quantum protocol, the memory requirements of this classical memory-enhanced algorithm vary continuously along all the range of values of $p$ , avoiding the discontinuous jump of the $\epsilon$ -machine at $p=\tfrac{1}{2}$ (see Fig. 3)—the slope of the curve, however, jumps discontinuously at $p=\tfrac{1}{2}$ while that of the quantum $\epsilon$ -machine varies continuously. To achieve this, analogous to the quantum $\epsilon$ -machine, the algorithm operates at the stationary state and at the ensemble level, i.e., on a large number of parallel independent samples. Furthermore, the algorithm exploits the eigendecomposition of the matrix of transition probabilities and an operational interpretation, introduced in Refs. realpe2017modeling ; realpe2018cognitive , of negative numbers arising in a suitable decomposition of the vector of probabilities.

Next, we will introduce a classical algorithm for the post-processed perturbed coin process (see Fig. 1b) that has the same memory gains as the corresponding quantum protocol in Fig. 2b. More precisely, this algorithm needs to keep in memory only a single bit, instead of a trit as the corresponding $\epsilon$ -machine (see Fig. 1b). To achieve this, we introduce stochastic causal states, i.e., classical mixtures of the ‘deterministic’ causal states of the corresponding $\epsilon$ -machine. Such mixtures do a similar job as the quantum superpositions of the amplitude-encoded quantum $\epsilon$ -machines. We may refer to such classical protocols as stochastic $\epsilon$ -machines.

In Sec. III.3, we will introduce general quantum-inspired memory-enhanced algorithms which, however, do not exploit square roots of probabilities. In Sec. III.3.4 and Appendix B we discuss how belief propagation can naturally lead to classical protocols formally similar to the corresponding quantum protocols. In particular, square roots of probabilities naturally arise in such protocols. However, we point out some caveats that we hope can be resolved in the near future, which would lead to a full algorithmic interpretation of square roots of probabilities with its corresponding additional memory gains.

III.2 Examples

III.2.1 Symmetrically perturbed coin process

Here we introduce a classical memory-enhanced algorithm to sample from the symmetrically perturbed coin process that requires less statistical memory than the corresponding $\epsilon$ -machine gu2012quantum —though the algorithm does not reach the performance of the quantum protocol. The lower statistical memory required by the quatum $\epsilon$ -machine is based on the eigen-representation of the corresponding density matrix $\rho$ . The eigen-representation of the density matrix $\rho$ associated to the symmetrically perturbed coin process is given by the right-most side of Eq. (20). When the process is completely random, i.e., $p=\tfrac{1}{2}$ , we have $\lambda=1$ and $\rho=\left|+\right\rangle\left\langle+\right|$ . In this case, the first term in the eigendecomposition of $\rho$ is associated to completely random, memoryless behaviour. So, when $p\neq\tfrac{1}{2}$ , the second term in the eigendecomposition of $\rho$ , which has negative elements, should somehow reintroduce the Markovian memory.

The situation is similar to the one described in Refs. realpe2017modeling ; realpe2018cognitive to operationally interpret negative numbers in a decomposition of general probability vectors, in this case the vector $(1-p,p)$ associated to a simple coin-toss process. Following Refs. realpe2017modeling ; realpe2018cognitive , this probability vector can be written in a way similar to the eigendecomposition of $\rho$ in Eq. (20), namely

[TABLE]

As before, the first vector can be interpreted as tossing coins uniformly at random, while the negative numbers in the second vector can be interpreted as a sort of ‘correction’ (see Refs. realpe2017modeling ; realpe2018cognitive and Sec. III.3 below for the general version of this idea). More explicitly, the decomposition of the probability vector in Eq. (35) could be read algorithmically as follows: (i) With probability one, toss coin uniformly at random (first vector); (ii) If $p<\tfrac{1}{2}$ and coin is in state $\left|1\right\rangle=(0,1)^{T}$ , with probability $1-2p$ flip the coin; (iii) If $p>\tfrac{1}{2}$ and coin is in state $\left|0\right\rangle=(1,0)^{T}$ , with probability $2p-1$ flip the coin. The flipping in parts (ii) or (iii) would bias the uniform distribution of coins obtained in (i) just enough to get the correct statistics of the coin toss.

Figure 4 describes an algorithm that applies this idea to the case of the symmetrically perturbed coin process. Such an algorithm requires to save in memory only $|2p-1|\leq 1$ bits per sample instead of the $1$ bit of the best classical algorithm known to date (see Fig. 2c). In this, we first toss $M\gg 1$ coins at random and save the state of $M|2p-1|\leq M$ of them in memory. This would yield a set of samples from the stationary distribution $\pi=(\tfrac{1}{2},\tfrac{1}{2})$ . To generate the next set of $M$ samples we toss again $M$ coins at random, which require zero memory, and then compare the $M|2p-1|$ coins saved in memory. If $p>\tfrac{1}{2}$ ( $p<\tfrac{1}{2}$ ) we flip those coins whose state equals (differs from) the state before. This ‘correction’ produces just enough bias to recover the correct statistics of the Markov chain, i.e., its transition probabilities, while respecting the stationary state $\pi=(\tfrac{1}{2},\tfrac{1}{2})$ .

III.2.2 Post-processed perturbed coin process

Here we introduce a memory-enhanced classical algorithm to sample from the post-processed perturbed coin process (see Fig. 1b) that requires the same topological memory of the corresponding quantum $\epsilon$ -machine (see Fig. 2b), i.e., it requires to keep in memory only a bit instead of the trit required by the classical $\epsilon$ -machine. The idea is to encode information stochastically by defining (classical) stochastic causal states in analogy with Eqs. (23)-(25) as

[TABLE]

As before, here the ket notation just refers to standard real vector notation. The stochastic causal state $\left|C_{1}\right\rangle$ is uncertain; we do not have full knowledge about it. Yet we will see that this would allow us to obtain the very same memory savings as the corresponding quantum $\epsilon$ -machine in Fig. 2b. In a sense, there seems to be computational value in knowing less.

Similarly, we can write the (classical) stochastic analogs of Eqs. (32)-(34) as

[TABLE]

Indeed, let $y_{1}\in\{0,1\}$ and $y_{3}\in\{0,1\}$ denote the values obtained after observing the first and third stochastic bits, respectively. The probability of observing both $y_{1}=1$ and $y_{3}=1$ is zero. The combination $x=y_{1}+2y_{3}\in\{0,1,2\}$ yields the three possible outputs of the post-processed perturbed coin process (see Fig. 1b) with the correct probabilities. The second stochastic bit transitions to the correct stochastic causal state (see Eqs. (36)-(38)), so the protocol can be applied iteratively. Since everything is real and non-negative, we can readily build a classical stochastic algorithm that implements the transitions in Eqs. (39)-(41).

For instance, if the initial output state is $x=0$ or $x=2$ , we set the first stochastic bit to $s^{(1)}=0$ or $s^{(1)}=1$ , respectively. Otherwise, we set it to $s^{(1)}=0$ or $s^{(1)}=1$ with probabilities $q$ and $1-q$ , respectively. To generate the next output, we do the following iteration: if $s^{(1)}=0$ we set both the second and third stochastic bits to zero or one, i.e., $s^{(2)}=0=s^{(3)}$ or $s^{(2)}=1=s^{(3)}$ , with probabilities $1-p$ and $p$ , respectively. Otherwise, if $s^{(1)}=1$ we set $s^{(1)}=0$ or $s^{(1)}=1$ with probabilities $q$ and $1-q$ , respectively; furthermore, we set $s^{(3)}=0$ . We then output $x=s^{(1)}+2\,s^{(3)}$ and update the memory stochastic bit as $s^{(1)}=s^{(2)}$ . By iterating this procedure we get a sequence of outputs $x$ that is a sample trajectory of the post-processed perturbed coin process (see Fig. 1a). As in the case of the quantum $\epsilon$ -machine, here we only need to keep in memory the first stochastic bit, $s^{(1)}$ , which at the end of each iteration will always be in one of the stochastic causal states, Eqs. (36)-(38).

III.3 Extensions

III.3.1 General considerations

Here we highlight some of the general concepts underlying the examples discussed in Sec. III.2. We first introduce a general quantum-inspired stochastic algorithm that extends the one introduced in Fig. 4 to general Markov chains. As the one introduced in Fig. 4, this algorithm also exploits the interpretation of negative numbers introduced in Refs. realpe2017modeling ; realpe2018cognitive . Afterwards, we discuss the concept of general stochastic causal states that extends the example of the post-processed perturbed coin process to general Markov chains and $\epsilon$ -machines. None of these generaliations exploits square roots of probabilities. This possibility is discussed briefly in Sec III.3.4 and in more detail in Appendix B.

To begin, the algorithm described in Sec. III.2.1 (see Fig. 4) can be interpreted as an eigendecomposition of the matrix of transition probabilities $T_{j\,j^{\prime}}=P(j^{\prime}|j)$ . According to the Perron-Frobenius theorem for irreducible stochastic matrices, the stationary state corresponds to the (left)-eigenvector $\left\langle\pi\right|$ with eigenvalue equal to one, i.e., $\left\langle\pi\right|T=\left\langle\pi\right|$ , which is the eigenvalue with largest absolute value. In this sense, sampling from the stationary state, which is the first step in the algorithm described in Fig. 4, yields the main contribution to generate stochastic trajectories with the correct statistics. The remaining eigenvectors yield ‘corrections’, which may involve negative numbers as in the example described in Fig. 4, whose relevance is given by the magintude of their associated eigenvalues.

In principle, these ideas can be further generalized by using eigendecompositions of more general matrices of transition probabilities and interpreting the negative numbers that may arise in a similar way as we have done in the example described in Fig. 4. Such potential generalizations should take into account that for non-symmetric matrices, unlike the example described in Fig. 4, the right and left eigenvectors may be different. In Sec. III.3.2 we discuss a general quantum-inspired memory-enhanced algorithm based on this idea. In Sec. III.3.3 we discuss the notion of general stochastic causal states that generalizes the ideas exploited in the example studied in Sec. III.2.2. Finally, in Sec. III.3.4 and Appendix B we discuss how square roots might also be exploited via belief propagation protocols.

III.3.2 General quantum-inspired algorithm with enhanced statistical memory

Here we describe how taking into account only one left- and right-eigenvector, i.e., those associated to the stationary state, already leads to a general quantum-inspired memory-enhanced stochastic algorithm. Consider an $(N+1)$ -state Markov chain with state space $\mathcal{S}$ specified by the matrix of transition probabilities (see Appendix A and Fig. 5 for a detailed example)

[TABLE]

where $\left\langle\pi\right|=(\pi_{0},\dotsc,\pi_{N})$ is the stationary state, $\left|{\rm 1\!\!I}\right\rangle$ is the corresponding right-eigenvector, i.e., the $(N+1)$ -dimensional vector with all entries equal to one, and $\Delta=T-\left|{\rm 1\!\!I}\right\rangle\left\langle\pi\right|$ is composed of all the eigenvalues and eigenvectors different from $\left\langle\pi\right|$ and $\left|{\rm 1\!\!I}\right\rangle$ . For simplicity, we are asuming ergodicity so there is a unique stationary distribution.

For each $j\in\mathcal{S}$ , let

[TABLE]

It is useful to define the positive fractions

[TABLE]

and the positive ratios

[TABLE]

where $Z_{j}$ is a normalization constant enforcing $\sum_{i^{\prime}\in\mathcal{S}_{j}^{+}}r_{j:\to i^{\prime}}^{+}=1$ .

To see that $f_{j}\leq 1$ we can show that all $-\left\langle j|\Delta|i\right\rangle/{\pi_{i}}\leq 1$ . Indeed, assume by contradiction that $-\left\langle j|\Delta|i\right\rangle/{\pi_{i}}>1$ so, from Eq. (42), we get $-\left\langle j|T|i\right\rangle+\pi_{i}>\pi_{i}$ , i.e., $-\left\langle j|T|i\right\rangle>0$ . This is a contradiction since $\left\langle j\right|T$ is a probability distribution. Similarly, since by definition $f_{j}$ is the maximum of all terms $-\left\langle j|\Delta|i\right\rangle/{\pi_{i}}$ , then all ratios $r^{-}_{j:i\to}\leq 1$ . Finally, all ratios $r_{j:\to i^{\prime}}^{+}\leq 1$ due to the normalization constant $Z_{j}$ . So, all $f_{j}$ , $r^{-}_{j:i\to}$ , and $r_{j:\to i^{\prime}}^{+}$ can be interpreted as probabilities.

A general quantum-inspired memory-enhanced stochastic algorithm can be designed as follows. Let $s_{\ell}^{t}\in\mathcal{S}$ , with $\ell=1,\dotsc,M$ , denote the $\ell$ -th sample generated at time step $t$ . For easy of reference, we will keep this ordering of samples throughout. If at time step $t$ the algorithm produces output $j$ we save it in memory with probability $f_{j}$ . At the end of time step $t$ we have generated $M$ outputs, $\{s_{\ell}^{t}\}_{\ell=1}^{M}$ , distributed according to the stationary state $\left\langle\pi\right|$ , and have saved in memory on average $m_{j}=f_{j}\pi_{j}M$ samples in state $\left\langle j\right|$ , with $j\in\mathcal{S}$ and $\pi_{j}=\left\langle\pi|j\right\rangle$ . Let

[TABLE]

denote the set of indexes of the samples $s_{\ell}^{t}$ with value $j\in\mathcal{S}$ that are saved in memory at time $t$ . After this process we have a total of $m=\sum_{j\in\mathcal{S}}m_{j}$ samples saved.

To generate the $\ell$ -th sample, $s_{\ell}^{t+1}$ , at time step $t+1$ we first generate a sample from the stationary state $\left\langle\pi\right|$ as an intermediate stage. If the $\ell$ -th sample was not saved at time step $t$ , i.e. $\ell\notin\bigcup_{j\in\mathcal{S}}\mathcal{M}_{j}^{t}$ , we just output the new sample $s_{\ell}^{t+1}$ . Now, suppose the $\ell$ -th sample, $s_{\ell}^{t}=j$ , was indeed saved at time step $t$ with value $j\in\mathcal{S}$ , i.e., $\ell\in\mathcal{M}_{j}^{t}$ . Suppose also that the new $\ell$ -th sample, $s_{\ell}^{t+1}=i$ , generated at the intermediate stage has value $i$ . If $i\in\mathcal{S}_{j}^{+}$ we just output the sample. Otherwise, if $i\in\mathcal{S}_{j}^{-}$ we change it into $i^{\prime}\in\mathcal{S}_{j}^{+}$ with probability

[TABLE]

This process is repeated $M$ times.

To see that this algorithm indeed generates samples with the right statistics, consider the probability $\mathcal{P}_{j\to i}$ to obtain a sample $s_{\ell}^{t+1}=i$ at time $t+1$ corresponding to a sample $s_{\ell}^{t}=j$ at time step $t$ . In the intermediate stage we obtain $s_{\ell}^{t+1}=i$ with probability $\pi_{i}$ . The probability that both $s_{\ell}^{t}=j$ was saved in memory at time step $t$ and $s_{\ell}^{t+1}=i$ at the intermediate stage is $\pi_{i}f_{j}$ . Now, if $s_{\ell}^{t}=j$ was saved in memory and $s_{\ell}^{t+1}=i\in\mathcal{S}^{-}_{j}$ at the intermediate stage, the value of the sample $s_{\ell}^{t+1}$ will change into any $i^{\prime}\in\mathcal{S}_{j}^{+}$ with probability $\sum_{i^{\prime}\in\mathcal{S}_{j}^{+}}r_{j:i\to i^{\prime}}$ . This leads to the relation

[TABLE]

which using Eqs. (46)-(49) yields $\mathcal{P}_{j\to i}=\pi_{i}+\left\langle j|\Delta|i\right\rangle$ , or $\mathcal{P}_{j\to i}=\left\langle j|T|i\right\rangle$ , as expected.

If $i\in\mathcal{S}_{j}^{+}$ instead, the sample $s_{\ell}^{t+1}=i$ does not change. However, we have to take into account all possible transitions into $i$ from all other samples $s_{\ell^{\prime}}^{t+1}$ that at the intermediate stage satisfy $s_{\ell^{\prime}}^{t+1}=i^{\prime}$ , with $\ell^{\prime}\in\mathcal{M}_{j}^{t}$ , $\ell^{\prime}\neq\ell$ , and $i^{\prime}\in\mathcal{S}^{-}_{j}$ . Since at the intermediate stage $s_{\ell^{\prime}}^{t+1}=i^{\prime}$ with probability $\pi_{i^{\prime}}$ and $s_{\ell^{\prime}}^{t}=j$ was saved in memory with probability $f_{j}$ , both events happen with probability $f_{j}\pi_{i^{\prime}}$ . Now, the total probability that any of the samples $s_{\ell^{\prime}}^{t+1}=i^{\prime}$ , with $i^{\prime}\in\mathcal{S}_{j}^{-}$ , corresponds to a sample $s_{\ell}^{t}=j$ saved in memory, i.e., $\ell^{\prime}\in\mathcal{M}_{j}^{t}$ , and transitions into $i$ is $\sum_{i^{\prime}\in\mathcal{S}_{j}^{-}}\pi_{i^{\prime}}f_{j}r_{j:i^{\prime}\to i}$ . This leads to the relation

[TABLE]

which using Eqs. (46)-(49) yields

[TABLE]

To change the sum above into a sum over elements of $S_{j}^{+}$ we can use the identity

[TABLE]

Here we have extended the sum from $\mathcal{S}_{j}^{+}\cup\mathcal{S}_{j}^{-}$ to $\mathcal{S}$ since we can add all $\left\langle j|\Delta|i^{\prime}\right\rangle=0$ without changing the sum. We have also taken into account that both $\pi_{i}$ and $\left\langle j|T|i\right\rangle$ are probability distributions over $i$ . Using this identity and Eq. (47) we finally obtain $\mathcal{P}_{j\to i}=\pi_{i}+\left\langle j|\Delta|i\right\rangle$ , or $\mathcal{P}_{j\to i}=\left\langle j|T|i\right\rangle$ , as expected.

Again, we have built this in analogy with the quantum protocols that reduce statistical complexity (see Sec. II.2), which exploit the eigendecomposition of the density matrix to find an optimal quantum encoding of the $M$ samples. However, we have exploited here only the eigenvector corresponding to the stationary state. It may be possible to exploit all eigenvalues to build algorithms that require less memory. Nevertheless, like in the case of the quantum protocols, the price to pay for this would be the need to find and deal with the full eigensystem of the matrix of transition probabilities.

III.3.3 General stochastic causal states

Now, the concept of stochastic causal state introduced in Sec. III.2.2 (see Eqs. (36)-(38)) can in principle be generalized to systems of any dimension. For instance, in the case of a Markov chain specified by matrix of transition probabilities $T$ , in analogy with Eqs. (36)-(38), we could define general stochastic causal states as

[TABLE]

Or in the case of general $\epsilon$ -machines, in analogy with Eq. (5), we could define them as

[TABLE]

where $\{\left|j\right\rangle\}_{j\in\mathcal{S}}$ is an orthonormal basis that represents the deterministic causal states $j\in\mathcal{S}$ of the classical $\epsilon$ -machine, and $\{\left|x\right\rangle\}_{x\in\mathcal{A}}$ is an orthonormal basis that represents the corresponding outputs $x\in\mathcal{A}$ .

We can in principle find a classical algorithm that requires a state space with the same dimensionality, $D_{\rm stoch}$ , of the space spanned by the stochastic causal states, $\{\left|C_{i}\right\rangle\}_{i\in\mathcal{S}}$ . If $D_{\rm stoch}$ is smaller than the dimensionality $|\mathcal{S}|$ of the original state space, $\mathcal{S}$ , we would obtain a quantum-inspired classical algorithm that requires less topological memory than the best classical counterpart known to date, i.e., the $\epsilon$ -machine. This was the case of the post-processed perturbed coin process studied in Sec. III.2.2. This particular instance has the advantage that it does not involve negative numbers. This feature facilitates the representation in terms of classical stochastic causal states, Eqs. (36)-(38), and the corresponding transitions, Eqs. (39)-(41). In more general situations there may be negative numbers involved, which could in principle be interpreted operationally as we have done with the symmetrically-perturbed coin process (see Fig. 4) and its generalization in Sec. III.3.2 (see also Appendix A).

III.3.4 Square roots of probabilities and belief propagation

In this work we have discussed memory-enhanced quantum-inspired stochastic algorithms that, unlike the quantum protocols discussed in Sec. II.2, do not make use of square roots of probabilities. In some instances this has led to lower memory savings than those that can be achieved with the corresponding quantum protocols (see, e.g., Sec. III.2.1). So, dealing with square roots of probabilities has the potential of achieving the same performance of quantum protocols in this cases.

Indeed, what actually motivated us to search for such quantum-inspired algorithms was the observation that the classical algorithm known as belief propagation, when run on cycle- or chain-like classical graphical models, follows a dynamics which is mathematically similar to quantum dynamics when there are no phases involved realpe2017modeling ; realpe2018cognitive . Interestingly, square roots of probabilities and the analogs of the Born rule arise naturally in this potentially more general framework.

In particular, it is possible to build graphical models whose belief propagation dynamics is mathematically analogous to the quantum protocols discussed in Sec. II.2.2. There are, however, some caveats that we hope can be resolved in the near future. With this in mind, we present in Appendix B a general discussion of these ideas along with the two graphical models associated to the two quantum protocols discussed in Sec. II.2.2. This approach might in principle lead to classical algorithms where the amplitude encodings associated to the corresponding quantum protocols could be understood as organizing samples in squares, instead of lines as in Fig. 4. Any required linear algebra manipulation would then be applied to the sides of such squares rather than to the full squares themselves. A full algorithmic interpretation of square roots of probabilities is left for future work.

IV Conclusions

Memory is a key computational resource for performing high-performance simulations, e.g., of large-scale complex systems or quantum computers. In this work we have shown that some quantum protocols thompson2018causal ; gu2012quantum ; ghafari2018single ; ghafari2019interfering ; palsson2017experimentally recently introduced for stochastic simulation, which can provide extreme memory advantages over the best classical protocols known to date, can actually be either implemented or approximated classically. In the former case we can design classical algorithms with the same memory reductions of the quantum counterparts. In the latter we can design algorithms that require less memory than the best classical algorithms known to date, yet they do not reach the same memory reductions of the quantum counterparts, so there is potentially still room for further improvement.

One of the concepts involved is the encoding of information about the past of a system on probabilistic mixtures, even though such information is already known and so, in a sense, ‘deterministic’. We showed how this can lead to a reduction in the dimension of the state space, or topological memory, of the variables that need to be kept in memory in order to sequentially generate a stochastic trajectory. In short, there is value in knowing less. This is the stochastic parallel of the corresponding quantum protocols which reduce memory requirements by saving already observed information about the past on quantum superpositions.

Another concept involved is the decomposition of matrices of transition probabilities into the dominant left and right eigenvectors, i.e., those associated to the stationary state, and the residual. We can then sample first from the stationary state, which requires zero memory about the past, and then correct the samples according to the residual to recover temporal correlations. This may involve negative numbers that may arise in the decomposition. We have used an operational interpretation of such negative numbers introduced in recent work realpe2017modeling ; realpe2018cognitive , to design classical sampling algorithms that require much less memory than the best known to date, i.e., the classical $\epsilon$ -machines.

Furthermore, following Refs. realpe2017modeling ; realpe2018cognitive , we have discussed how the classical message-passing algorithm known as belief propagation can lead to a dynamics mathematically similar to amplitude-encoded quantum protocols, where the phase degree of freedom does not play any role. In particular, square roots of probabilities arise naturally in this framework. Except for a very recent quantum protocol ghafari2018single , all quantum memory-enhanced protocols for stochastic simulation investigated to date, like the belief propagation counterparts, do not make use of quantum phases. Such restricted amplitude-encoded quantum protocols, though, can lead to extreme memory reductions aghamohammadi2018extreme . A full algorithmic interpretation of such square roots of probabilities, however, is left for future work.

Being classical, such quantum-inspired algorithms could be implemented in state-of-the-art high-performance computers. This could potentially enhance the study of complex systems and further raise the bar for near-term quantum computers to demonstrate ‘quantum supremacy’.

To conclude, the possibility to classically implement protocols that were previously considered quantum raises the question: what is quantum? An intriguing possibility which arose in Refs. realpe2017modeling ; realpe2018cognitive (see also Ref. realpe2019can ), which motivated this work, is that the peculiar classical interaction of a physical agent, e.g., a robot or a scientist, with a classical experimental setup, when described from the perspective of the agent itself leads to a fully quantum-like dynamics. If this is so, it might be possible to extend the ideas introduced here to general quantum protocols that may include phase degrees of freedom, as those recently studied in Ref. ghafari2018single .

Contributions

J.R.G. conceived the main ideas and derived all technical results. N.K. supervised the project and provided critical feedback to direct the research. J.R.G. wrote the manuscript in consultation with N.K.

Competing interests

The authors declare no competing interests.

Supplementary information

Appendix A Illustrative example of algorithm in Sec. III.3.2

Here we discuss a specific three-state Markov chain to illustrate the general algorithm introduced in Sec. III.3.2. Instead of just replacing the specific values here into the general expressions introduced therein, we will rather work this example out from scratch to motivate the latter. Although this will necessarily lead to a much longer discussion, we hope this approach presents a complementary perspective that may help clarify any confusion that may have arisen in reading the general presentation in Sec. III.3.2.

Consider the three-state Markov chain specified by the transition probability matrix

[TABLE]

Here we have written $T$ in terms of the left and right eigenvectors associated to the largest eigenvalue $\lambda=1$ , which are

[TABLE]

and $\left|{\rm 1\!\!I}\right\rangle=(1,1,1)^{T}$ , respectively. The reverse relation of Eq. (56) is

[TABLE]

The eigenvector $\left\langle\pi\right|$ in Eq. (57) is the stationary state of the Markov chain.

Although the Markov chain specified by the transition matrix $T$ in Eq. (56) could be described by just two causal states, it serves to illustrate the main ideas we want to discuss. We could use the same ideas for the corresponding two-level $\epsilon$ -machine, but this would be a rather trivial example. This also serves to illustrate that we can improve not only on $\epsilon$ -machines, but also on sub-obtimal algorithms that may arise in real-life applications, where finding the corresponding $\epsilon$ -machine may be hard. Of course, this observation also applies for the orginial quantum protocols.

If the state of the system is $\left\langle j\right|$ , with $j\in\{0,1,2\}$ , its next state is drawn from the probability vector

[TABLE]

In the spirit of Fig. 4 we can generate samples by first sampling from the stationary state $\left\langle\pi\right|$ , which is the same for all $j$ , and afterwards correct the samples according to the second term in Eq. (59), i.e., $\left\langle j\right|\Delta$ .

For concreteness, let us consider the case when $p=\tfrac{1}{9}$ and $q=\tfrac{2}{3}$ , which yields (see Eqs. (57) and (58))

[TABLE]

and suppose we are generating $M=18^{2}$ samples in parallel (see Fig. 5; cf. Fig. 4). The $M$ samples are at the stationary state, Eq. (60), so the number of samples in states $\left\langle j\right|$ , with $j=0$ , $j=1$ and $j=2$ , is on average

[TABLE]

respectively. For easy of illustration, we will work with sample statistics as if they were population statistics, i.e., we will neglect fluctuations about the mean in the following discussion.

To generate the next set of samples, we first generate $M=18^{2}$ new samples from the stationary state $\left\langle\pi\right|$ , Eq. (60). This is equivalent to generate three subsets of samples of size $M_{0}$ , $M_{1}$ and $M_{2}$ , each distributed according to $\left\langle\pi\right|$ (see Fig. 5). To recover the temporal correlations between the current and the next set of samples, we need to ‘correct’ each subset of $M_{j}$ samples using vectors $\left\langle j\right|\Delta$ , with $j=0,1,2$ . Consider, for instance, the $M_{0}$ samples that transition from state $\left\langle j=0\right|$ . Since these $M_{0}$ samples are distributed according to the stationary state, Eq. (60), on average $M_{0}\pi_{0}=4\times 4$ stay in state $\left\langle j=0\right|$ , $M_{0}\pi_{1}=4\times 9$ transition to state $\left\langle j=1\right|$ , and $M_{0}\pi_{2}=4\times 5$ transition to state $\left\langle j=2\right|$ .

To ‘correct’ these $M_{0}$ samples according to vector $\left\langle 0\right|\Delta$ in Eq. (61) we need to change some of the samples that are in states with negative entries in $\left\langle 0\right|\Delta$ , i.e., state $\left\langle j=1\right|$ , into states that have positive entries in $\left\langle 0\right|\Delta$ according to the ratios specified by these entries. So, for every three samples in state $\left\langle j=1\right|$ that change, two should change into state $\left\langle j=0\right|$ and one into state $\left\langle j=2\right|$ . More precisely, since in this case the only negative entry in Eq. (61) is $\left\langle 0\right|\Delta\left|1\right\rangle=-{3}/{18}$ , we need to change on average $-M_{0}\left\langle 0\right|\Delta\left|1\right\rangle=4\times 3$ samples. On average $M_{0}\left\langle 0\right|\Delta\left|0\right\rangle=4\times 2$ should change into state $\left\langle j=0\right|$ and $M_{0}\left\langle 0\right|\Delta\left|2\right\rangle=4\times 1$ should change into state $\left\langle j=2\right|$ . This recovers the correct statistics of the transitions from state $\left\langle j=0\right|$ for the simple reason that the corresponding vector of transition probabilities $\left\langle 0\right|T=\left\langle\pi\right|+\left\langle 0\right|\Delta$ is the sum of the stationary distribution vector and the ‘correction’ vector (see Eq. (59) and general discussion in Sec. III.3.2). Although this ‘correction’ generally take the $M_{0}$ samples out of the stationary state, all $M=M_{0}+M_{1}+M_{2}$ samples will globally remain at the stationary state once we also ‘correct’ the samples that transition from states $\left\langle j=1\right|$ and $\left\langle j=2\right|$ , as we will see below.

Now, the correction vector $\left\langle 1\right|\Delta$ in Eq. (62) associated to the $M_{1}$ samples that transition from state $\left\langle j=1\right|$ has two negative entries, i.e., $j=0$ and $j=2$ , and one positive entry, i.e. $j=1$ . In this case we need to change some of the $M_{1}$ samples in states $\left\langle j=0\right|$ and $\left\langle j=2\right|$ into state $\left\langle j=1\right|$ according to the ratios specified by these entries (see Fig. 5). More precisely, following the same reasoning of the previous case, on average we need to change $-M_{1}\left\langle 1|\Delta|0\right\rangle=9\times 2$ out of the $M_{1}\pi_{0}=9\times 4$ samples in state $\left\langle j=0\right|$ and $-M_{1}\left\langle 1|\Delta|2\right\rangle=9\times 1$ out of the $M_{1}\pi_{2}=9\times 5$ samples in state $\left\langle j=2\right|$ into state $\left\langle j=1\right|$ . This is the reverse process of the previous case: for each sample in state $\left\langle j=2\right|$ that is changed into state $\left\langle j=1\right|$ , two samples in state $\left\langle j=0\right|$ are changed into state $\left\langle j=1\right|$ .

Suppose we save in memory a given number $m_{1}$ of samples, $\{s_{\ell_{k}}^{t}\}_{k=1}^{m_{1}}$ , randomly chosen out of the $M_{1}$ samples at time step $t$ . Then, on average, $m_{1}\pi_{0}$ of the corresponding new samples, $\{s_{\ell_{k}}^{t+1}\}_{k=1}^{m_{1}}$ , generated in the intermediate stage would be in state $\left\langle j=0\right|$ and $m_{1}\pi_{2}$ of them would be in state $\left\langle j=2\right|$ . Each of these numbers should be larger than the corresponding number of samples that need to be changed to properly correct the $M_{1}$ samples, i.e., $m_{1}\pi_{0}\geq-M_{1}\left\langle 1|\Delta|0\right\rangle$ and $m_{1}\pi_{2}\geq-M_{1}\left\langle 1|\Delta|2\right\rangle$ . The minimum number of samples that need to be saved to typically guarantee these conditions is

[TABLE]

or $m_{1}^{\ast}=\tfrac{1}{2}M_{1}$ . Equivalently, the minimum fraction of the $M_{1}$ samples that we need to save in memory is $f_{1}=m_{1}^{\ast}/M_{1}=\tfrac{1}{2}$ —these fractions can be interpreted probabilistically, as discussed in Sec. III.3.2.

So, if we save in memory only $m_{1}^{\ast}$ samples, on average $m_{1}^{\ast}\pi_{0}$ samples will be in state $\left\langle j=0\right|$ and $m_{1}^{\ast}\pi_{2}$ samples will be in state $\left\langle j=2\right|$ . We need to change $-M_{1}\left\langle 1|\Delta|0\right\rangle$ of the $m_{1}^{\ast}\pi_{0}$ samples in state $\left\langle j=0\right|$ into state $\left\langle j=1\right|$ , or a ratio $r_{1:0\to}^{-}=-M_{1}\left\langle 1|\Delta|0\right\rangle/\left(m_{1}^{\ast}\pi_{0}\right)=1$ . Analgously, we need to change $-M_{1}\left\langle 1|\Delta|2\right\rangle$ of the $m_{1}^{\ast}\pi_{2}$ samples in state $\left\langle j=2\right|$ into state $\left\langle j=1\right|$ , or a ratio $r_{1:2\to}^{-}=-M_{1}\left\langle 1|\Delta|2\right\rangle/\left(m_{1}^{\ast}\pi_{2}\right)=\tfrac{2}{5}$ —these ratios can also be interpreted probabilistically, as discussed in Sec. III.3.2. In all these fractions and ratios the actual number of samples $M_{1}$ cancels out; this is true in general. In the previous case we have $m_{0}^{\ast}=-M_{0}\left\langle 0|\Delta|1\right\rangle/\pi_{1}=\tfrac{1}{3}M_{0}$ , or $f_{0}=m_{0}^{\ast}/M_{0}=\tfrac{1}{3}$ , and $r_{0:1\to}^{-}=1$ ; here the actual number of samples $M_{0}$ also cancels out.

Similarly, the only negative entry in the correction vector $\left\langle 2\right|\Delta$ in Eq. (63) is $\left\langle 2\right|\Delta\left|1\right\rangle=-{3}/{18}$ . To correct the $M_{2}$ samples corresponding to those that were in state $\left\langle j=2\right|$ at time step $t$ , we need to change on average $-M_{2}\left\langle 2\right|\Delta\left|1\right\rangle=5\times 3$ out of the $M_{2}\pi_{1}$ samples that, on average, transitioned into state $\left\langle j=1\right|$ at time step $t+1$ : $M_{2}\left\langle 2\right|\Delta\left|0\right\rangle=5\times 2$ should change, on average, into state $\left\langle j=0\right|$ and $M_{2}\left\langle 2\right|\Delta\left|2\right\rangle=5\times 1$ should change, on average, into state $\left\langle j=2\right|$ . In this case we have $m_{2}^{\ast}=-M_{2}\left\langle 2|\Delta|1\right\rangle/\pi_{1}=\tfrac{1}{3}M_{2}$ , or $f_{2}=m_{2}^{\ast}/M_{2}=\tfrac{1}{3}$ . Additionally, $r^{-}_{2:1\to}=1$ since in this case there is only one negative entry. So, all of the corresponding $m_{2}^{\ast}\pi_{1}$ samples that are in state $\left\langle j=1\right|$ at the intermediate stage should change. All of these samples should be distributed between states with positive entries in the correction vector $\left\langle 2\right|\Delta$ , i.e., $j=0$ and $j=2$ , according to the corresponding ratios. In this case such ratios are $r_{2:\to 0}^{+}=\tfrac{1}{Z_{2}}M_{2}\left\langle 2\right|\Delta\left|0\right\rangle=\tfrac{2}{3}$ and $r_{2:\to 2}^{+}=\tfrac{1}{Z_{2}}M_{2}\left\langle 2\right|\Delta\left|2\right\rangle=\tfrac{1}{3}$ , respectively, where $Z_{2}$ is a normalization constant enforcing $r_{2:\to 0}^{+}+r_{2:\to 2}^{+}=1$ .

Applying the same reasoning to the first case analyzed, i.e., those samples transitioning from state $\left\langle j=0\right|$ , we obtain $r^{+}_{0:\to 0}=\tfrac{1}{Z_{0}}M_{0}\left\langle 0\right|\Delta\left|0\right\rangle=\tfrac{2}{3}$ and $r^{+}_{0:\to 2}=\tfrac{1}{Z_{0}}M_{0}\left\langle 0\right|\Delta\left|2\right\rangle=\tfrac{1}{3}$ . We obtain the same numbers as before because in this example we have $\left\langle 0\right|\Delta=\left\langle 2\right|\Delta$ (see Eqs. (61) and (63)). This is not true in general, though.

Appendix B Quantum-like belief propagation protocols

Here we describe the potentially more general perspective discussed in Sec. III.3.4 (Appendix B.1). Furthermore, we introduce the two graphical models whose belief propagation dynamics is mathematically analogous to the quantum dynamics of the examples discussed in Sec. II.2.2 (Appendix B.2). Finally, we point out some caveats that we hope can be resolved in the near future (Appendix B.3).

B.1 General considerations

Belief propagation is a message-passing algorithm to efficiently compute marginals of graphical models by passing messages from node to node along the underlying graph (see Fig. 6; see also Sec. 14 in Ref. mezard2009information ). It was recently shown realpe2017modeling ; realpe2018cognitive (see Secs. V B and C in Ref. realpe2017modeling ) that belief propagation on chain- or cycle-like graphs is mathematically analogous to quantum dynamics in imaginary time, i.e., the dynamics obtained after changing time $t$ to $-it$ , where $i$ is the imaginary unit. Since the imaginary unit only appears multiplying the phase of a wave function, it is natural to expect that quantum protocols that do not exploit information about the phase, like those presented in Sec. II.2 could be efficiently simulated classically.

For instance, let us consider a graphical model of $N+1$ binary variables $s_{\ell}\in\{0,1\}$ , with $\ell=0,\dotsc,N$ , interacting on a cycle whose probability distribution factorizes as (see Fig. 6)

[TABLE]

where periodicity in the index $\ell$ is understood, i.e., $s_{N+1}=s_{0}$ Here the non-negative functions $F_{\ell}(s_{\ell+1},s_{\ell})$ , for $\ell=0,\dotsc N$ , are called factors and $Z$ is the normalization constant. For instance, if $\mathcal{P}$ is a Boltzmann distribution, the factor $F_{\ell}$ can be a Boltzmann weight $F_{\ell}(s_{\ell+1},s_{\ell})=e^{-\beta E_{\ell}(s_{\ell+1},s_{\ell})}$ associated to an energy function $E_{\ell}$ and an inverse temperature $\beta$ . It is straightforward to extend the ideas presented here to $D$ -dimensional arrays of binary variables $\mathbf{s}_{\ell}=(s_{\ell}^{(1)},\dotsc,s_{\ell}^{(D)})$ (see Sec. B.2 and Fig. 6).

Belief propagation is guaranteed to yield exact marginals when the graphical model has the topology of a tree. So, let us first assume that factor $F_{N}(s_{0},s_{N})=1$ for all $s_{0},s_{N}\in\{0,1\}$ , so the cycle turns effectively into a chain. In this case, belief propagation passes messages $\mu_{\ell\to\ell+1}(s_{\ell})$ and $\nu_{\ell\to\ell-1}(s_{\ell})$ starting at nodes $\ell=0$ and $\ell=N$ respectively mezard2009information (see Sec. 14 therein). After such messages have travelled the entire chain, using a suitable normalization for the messages, the marginal probability of variable $s_{\ell}$ is given by realpe2017modeling ; realpe2018cognitive (see Sec. V B in Ref. realpe2017modeling )

[TABLE]

This is reminiscent of the Born rule of quantum theory, and we can indeed write these messages as realpe2017modeling ; realpe2018cognitive

[TABLE]

If we do $t\to-it$ in the Schrödinger equation, the imaginary unit $i$ disapears. The imaginary unit multiplying the phase $\varphi_{\ell}(s)$ of a wave function $\psi_{\ell}(s)=\sqrt{p_{\ell}(s)}e^{i\varphi_{\ell}(s)}$ also disappears, which leads to the imaginary-time analogs of a wave function and its conjugate, Eqs. (70) and (71). Furthermore, if we write

[TABLE]

the belief propagation algorithm associated to Eq. (68) is determined by the update rules

[TABLE]

which are similar to the update rules of quantum theory, except that the factors $F_{\ell}$ are real and non-negative. However, see Refs. realpe2017modeling ; realpe2018cognitive for a discussion of more general situations.

From a mathematical point of view, the only difference between Eqs. (70) and (71) and quantum wavefunctions is the lack of the imaginary unit $i$ multiplying the phase $\phi_{\ell}$ . So it is natural to expect that quantum protocols that only exploit amplitudes of wave functions, such as those based in Eqs. (12), (13) and (23)-(25), might be implemented classically, e.g., via belief propagation algorithms. This is not totally unexpected given that $\sqrt{p_{\ell}(s_{\ell})}$ contains the same information encoded in $p_{\ell}(s_{\ell})$ . We will argue here that this may indeed be the case in some instances.

B.2 Examples

B.2.1 Symmetrically perturbed coin process

To gain some initial intuition, we will first use an alternative method also introduced in Refs. realpe2017modeling ; realpe2018cognitive (see Sec. VI C in Ref. realpe2017modeling ), which, instead of messages, uses probability matrices that are analogous to the imaginary-time version of density matrices. Due to the absence of phases in the examples considered here, such probability matrices actually have the same mathematical form as the corresponding density matrices of interest, as we will see below.

Figure 6a shows a graphical model, defined on pairs of binary variables $\mathbf{s}_{\ell}=(s_{\ell}^{(1)},s_{\ell}^{(2)})$ with $\ell=0,\dotsc,3$ , whose belief propagation dynamics is mathematically analogous to the quantum protocol of the symmetrically perturbed coin process described in Sec. II.2.2. As we mentioned in Sec. II.2.2, the undetermined entries in Eq. (14) are irrelevant for the application of the quantum protocols described therein. They are chosen such that $U_{x}$ is unitary, which requires some entries to be negative. However, as they are irrelevant, so we can also choose them equal to zero in the belief propagation model, for instance. We can then implement the corresponding transformations using a classical factor

[TABLE]

instead (cf. Eq. (14)).

Using Eq. (76), the factors in Fig. 6a can be defined as

[TABLE]

where $\mathrm{CNOT}^{(1,2)}$ is defined in Eq. (15), and $j=0,1$ refers to the causal states. In this case, $x_{0}=p$ and $x_{1}=1-p$ , to implement the analogs of quantum causal states $\left|\xi_{0}\right\rangle$ and $\left|\xi_{1}\right\rangle$ , respectively.

The probability of a path $\mathbf{s}=(\mathbf{s}_{0},\dotsc,\mathbf{s}_{3})$ can be written as realpe2017modeling ; realpe2018cognitive

[TABLE]

where we have written $\mathbf{s}_{0}^{\prime}=\mathbf{s}_{0}$ for future convenience—here the normalization constant is $Z=1$ . Due to the circular topology of the graphical model in Fig. 6a it is in general not possble to compute the marginal $p_{1}$ at step $\ell=1$ , for instance, from the marginal

[TABLE]

at step $\ell=0$ alone realpe2018cognitive . If the graphical model had the topology of a chain, instead, this would indeed be possible, i.e., we would have a Markov chain.

However, if we relax the condition that $\mathbf{s}_{0}^{\prime}=\mathbf{s}_{0}$ in Eqs. (78) and(79), and interpret the factors as probability matrices, we can define a real probability matrix realpe2017modeling ; realpe2018cognitive (see Sec. VI C in Ref. realpe2017modeling )

[TABLE]

at step $\ell=0$ . The $\mathbf{s}_{0}$ -th diagonal element of $P_{0}$ in Eq. (80) is the marginal $p_{0}(\mathbf{s}_{0})$ , which in this case equals one if $\mathbf{s}_{0}=(0,0)$ and zero otherwise.

Following the same reasoning, and taking into account that $F_{\rm CNOT}F_{\rm CNOT}={\rm 1\!\!I}$ , we can compute a probability matrix (see Eq. (77))

[TABLE]

at step $\ell=1$ . Again, the diagonal of $P_{1}$ contains the vector of marginal probabilities $p_{1}$ . In this case, the matrix $P_{1}$ is mathematically analogous to the density matrix associated to the quantum causal state $\left|\xi_{j}\right\rangle\left|\xi_{0}\right\rangle$ of the quantum protocol shown in Fig. 2b (see Eqs. (12), (13), and (17)).

Matrix $P_{1}$ in Eq. (81) is obtained by the cyclic permutation of the final matrix $F_{\rm PREP}$ in Eq. (80). Continuing with these cyclic permutations, we get (see Eqs. (18) and (19))

[TABLE]

after which we get back to $P_{0}=\left|0\right\rangle\left\langle 0\right|\otimes\left|0\right\rangle\left\langle 0\right|$ , so the dynamics is self-consistent. The probability matrix $P_{2}$ has the same mathematical form of the density matrix associated to states $\left|\chi_{j}\right\rangle$ , with $j\in\{0,1\}$ , in Eqs. (18) and (19).

By measuring the first binary variable of $P_{2}$ we get the same statistics and the same state update for the probabilistic state of the second variable as the quantum protocol in Fig. 2b.

Now, let us turn back to the belief propagation algorithm, Eqs. (74) and (75), which, again, do not in general yield exact marginals for graphical models with loops. In this case, however, the marginals estimated by belief propagation (see Eq. (69)) are indeed exact if we pick “initial” messages (i.e., at index $\ell=0$ ) realpe2017modeling ; realpe2018cognitive

[TABLE]

which are consistent with $P_{0}$ in Eq. (80).

The forward belief propagation iteration, Eq. (74), leads in this case to (see Fig. 6a and Eqs. (12),(13), (18) and (19))

[TABLE]

The message $\left|\mu_{1\to 2}\right\rangle$ at $\ell=1$ is mathematically equivalent to the initial quantum state $\left|\xi_{j}\right\rangle\left|\xi_{0}\right\rangle$ of the quantum protocol (see Fig. 2b and Eqs. (12) and (13)). Similarly, the message $\left|\mu_{2\to 3}\right\rangle$ at $\ell=2$ is mathematically equivalent to the quantum state (17). Finally, after the last two steps, Eqs. (88) and (89), we turn around the cycle and belief propagation consistently yields back the initial message $\left|\mu_{0\to 1}\right\rangle$ . So the forward belief propagation iteration, Eq. (74), is self-consistent, even though the graphical model does not have the topology of a tree.

The backward belief propagation iteration, Eq. (75), yields the transposed equations, i.e., at each time index $\ell$ we have

[TABLE]

which is equivalent to taking the hermitian conjugate, as in quantum theory, since the messages are real.

B.2.2 Post-processed perturbed coin process

Figure 6b shows a graphical model, defined on binary variables $\mathbf{s}_{\ell}=(s_{\ell}^{(1)},s_{\ell}^{(2)},s_{\ell}^{(3)})$ with $\ell=0,\dotsc,7$ , where belief propagation follows a dynamics analogous to that of the quantum protocol for the post-processed perturbed coin process in Sec. II.2.2 (see Fig. 2b). The factors in Fig. 6b are defined as

[TABLE]

where $F_{p}$ and $F_{1-q}$ are obtained by setting $x=p$ and $x=1-q$ in Eq. (76), respectively, and $\mathrm{CNOT}^{(3,2)}$ is defined in Eq. (30). Factors $G_{\rm Up}$ and $G_{\rm U_{1-q}}$ are analogous to the gates in Eqs. (28) and (29). The factor $G_{j}$ prepares from $\left|0\right\rangle$ the analog of the initial quantum causal state $\left|\xi_{j}\right\rangle$ , with $j=0,1,2$ . More precisely, $G_{0}=\left|0\right\rangle\left\langle 0\right|$ , or $G_{1}=F_{q}$ , or $G_{2}=\left|1\right\rangle\left\langle 0\right|$ if the initial quantum causal state is $\left|\xi_{0}\right\rangle$ , or $\left|\xi_{1}\right\rangle$ , or $\left|\xi_{2}\right\rangle$ , respectively (see Eqs. (23)-(25)).

As before, the graphical model in Fig. 6b has the topology of a circle and belief propagation is not guaranteed in general to yield exact marginals for graphical models with cycles. However, the variable $\mathbf{s}_{0}$ is connected to factors $G_{\rm PREP}$ and $G_{\rm PREP}^{T}$ , whose entries in the second column and second row, respectively, are zero for any $j=0,1,2$ (see Eq. (91)). This implies that $\mathbf{s}_{0}=(0,0,0)$ with probability one—to see this notice that

[TABLE]

In this case, the message-passing equations become exact if, consistent with Eq. (95), we pick “initial” messages (i.e., at index $\ell=0$ ) realpe2017modeling ; realpe2018cognitive (see Sec. V B 2 in Ref. realpe2017modeling )

[TABLE]

The first iteration of the belief propagation equations, Eqs. (74) and (75), yields

[TABLE]

where $\left|\xi_{j}\right\rangle=F_{j}\left|0\right\rangle$ , which has the same mathematical form as the initial quantum causal state of the protocol shown in Fig. 6b (see Eqs. (23)-(25)).

Now $F_{p}$ and $F_{1-q}$ coincide, respectively, with $U_{p}$ and $U_{1-q}$ , except in the undetermined entries $\#$ of the latter (see Eqs. (76) and (14)). However, such undetermined entries are irrelevant for the quantum protocol in Fig. 2b (see Sec. II.2.2). As far as this quantum protocol is concerned, Eqs. (92)-(94) are equivalent to Eqs. (28)-(30). This implies that the belief propagation dynamics induced on the messages $\left|\mu_{\ell\to\ell+1}\right\rangle$ and $\left\langle\nu_{\ell\to\ell-1}\right|$ by Eqs. (74) and (75) with factors $G_{\rm Up}$ , $G_{\rm U_{1-q}}$ , and $G_{\rm CNOT}$ defined in Eqs. (92)-(94), has the same mathematical form as the dynamics of the corresponding quantum protocol in Fig. 2b. In particular (see Eqs. (32)-(34)),

[TABLE]

which yields the same statistics as the post-processed perturbed coin process (see Fig. 1b), as described in Sec. II.2.2.

The branch of the graphical model in Fig. 6a that connects variables $\mathbf{s}_{\ell}$ , with $\ell=0,1,2,3,4$ , “prepares” the message in Eq. (100). The remaining branch connecting variables $\mathbf{s}_{\ell}$ , with $\ell=0,7,6,5,4$ , is a kind of “mirror image” of the former, i.e., it has the same factors but transposed (see Fig. 6a)—such mirror image yields the analogs of the corresponding conjugate wave functions. Indeed, since the initial $\nu$ -message in Eq. (97) is also the transpose of the initial $\mu$ -message in Eq. (96), the dynamics of the $\nu$ -messages is the same as that of the $\mu$ -messages. So, the $\nu$ -messages have the same mathematical form of the conjugate wave functions in the corresponding quantum protocol (see Fig. 2a and Example 1 in Sec. II.2.2). This happens because there are no phases involved.

Finally, the remaining iterations of belief propagation for the $\mu$ -messages roll back the messages. For instance, since $G_{\rm CNOT}G_{\rm CNOT}={\rm 1\!\!I}$ , we get

[TABLE]

with

[TABLE]

Now, since $F_{x}^{T}F_{x}=\left|0\right\rangle\left\langle 0\right|$ , we get $G^{T}G=\left|000\right\rangle\left\langle 000\right|$ for all $j=0,1,2$ . So, after turning around the cycle we get

[TABLE]

which shows the belief propagation equations are self-consistent in this case too (cf. Sec. B.2.1).

B.3 Quantum-like protocols and some caveats

The graphical models introduced in Sec. B.2 have a belief propagation dynamics mathematically analogous to the first iteration of the quantum protocols in Fig. 2. This first iteration includes the preparation of the message $\left|\xi_{j}\right\rangle$ via factor $F_{x_{j}}$ . However, this is to be avoided in future iterations—otherwise any potential memory savings would be destroyed by having to prepare message $\left|\xi_{j}\right\rangle$ at each time step.

Figure 7 shows an extension of the graphical model in Fig. 6a whose dynamics is mathematically analogous to that of two iterations of the quantum protocol in Fig. 2a. This is essentially composed of two copies of the graphical model in Fig. 6a connected through the variables $s_{2}^{(2)}$ and $s_{3}^{(2)}$ which, in contrast to variable $s_{2}^{(1)}$ , are not observed. The second copy (top right) does not have the preparation factor $F_{x_{j}}$ as required. This construction can be continued iteratively by splitting variable $s_{6}^{(3)}$ into two and connecting a third copy to them, and so on. A similar construction can be done for the graphical model in Fig. 6b. There are some caveats, though.

First, if we interpret the graphical model in Fig. 7 as a static Ising-like system, instead of reducing the amount of memory required to generate stochastic trajectories, in comparison to the corresponding $\epsilon$ -machine, the number of stochastic bits needed to implement the whole graphical model would rather proliferate. A potential way out is to implement each factor graph at a time (see Fig. 7). This would be closer to the original interpretation of such type of graphical models in Refs. realpe2017modeling ; realpe2018cognitive as describing the dynamics of a physical agent, e.g., a robot, interacting with an experimental device. However, this may require significant isolation from the environment to avoid the probabilities, and messages, associated to such isolated variables from changing too much. Whether the amount of isolation required is similar to that of quantum protocols is an open question at the moment.

Second, the memory gain in the case of the symmetrically perturbed coin process is in terms of statistical memory, which needs a quantum-like encoding à la Schumacher schumacher1995quantum . In the example studied here such an encoding amounts essentially at working with the Fourier transform of the messages. It is not clear at this point how to implement this either physically or algorithmically. This caveat does not apply for the topological memory savings in the case of the belief propagation protocol for the post-processed perturbed coin process. However, in more general cases a similar caveat may arise also for topologically memory saving graphical models if the lower-dimensional representation of the messages corresponding to the quantum causal states (see Eqs. (23)-(25)) involves negative numbers. This may be dealt with using the operational interpretation realpe2018cognitive of this type of negative numbers we have discussed in Secs. III.2 and III.3.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) J.-P. Bouchaud, J. Bonart, J. Donier, and M. Gould, Trades, quotes and prices: financial markets under the microscope . Cambridge University Press, 2018.
2(2) D. Bisias, M. Flood, A. W. Lo, and S. Valavanis, “A survey of systemic risk analytics,” Annu. Rev. Financ. Econ. , vol. 4, no. 1, pp. 255–296, 2012.
3(3) J. D. Farmer and D. Foley, “The economy needs agent-based modelling,” Nature , vol. 460, no. 7256, p. 685, 2009.
4(4) N. Stern, “Economics: current climate models are grossly misleading,” Nature News , vol. 530, no. 7591, p. 407, 2016.
5(5) Y. Cai, K. L. Judd, T. M. Lenton, T. S. Lontzek, and D. Narita, “Environmental tipping points significantly affect the cost- benefit assessment of climate policies,” Proceedings of the National Academy of Sciences , p. 201503890, 2015.
6(6) C. L. Franzke, T. J. O’Kane, J. Berner, P. D. Williams, and V. Lucarini, “Stochastic climate theory and modeling,” Wiley Interdisciplinary Reviews: Climate Change , vol. 6, no. 1, pp. 63–78, 2015.
7(7) B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science , vol. 350, no. 6266, pp. 1332–1338, 2015.
8(8) Z. Ghahramani, “Probabilistic machine learning and artificial intelligence,” Nature , vol. 521, no. 7553, p. 452, 2015.