Memory cost of temporal correlations

Costantino Budroni; Gabriel Fagundes; Matthias Kleinmann

arXiv:1902.06517·quant-ph·September 16, 2019

Memory cost of temporal correlations

Costantino Budroni, Gabriel Fagundes, Matthias Kleinmann

PDF

TL;DR

This paper investigates the memory cost of simulating temporal correlations in different theories, including classical, quantum, and general probability theories, revealing inequalities that distinguish these correlations.

Contribution

It introduces a theory-independent framework to analyze the memory cost of temporal correlations, extending beyond quantum mechanics to general probability theories.

Findings

01

Derived inequalities to distinguish classical, quantum, and GPT temporal correlations.

02

Showed that systems with finite memory capacity can exhibit different temporal correlations.

03

Provided a unified approach to compare nonclassicality across theories.

Abstract

A possible notion of nonclassicality for single systems can be defined on the basis of the notion of memory cost of classically simulating probabilities observed in a temporal sequence of measurements. We further explore this idea in a theory-independent framework, namely, from the perspective of general probability theories (GPTs), which includes classical and quantum theory as special examples. Under the assumption that each system has a finite memory capacity, identified with the maximal number of states perfectly distinguishable with a single measurement, we investigate what are the temporal correlations achievable with different theories, namely, classical, quantum, and GPTs beyond quantum mechanics. Already for the simplest nontrivial scenario, we derive inequalities able to distinguish temporal correlations where the underlying system is classical, quantum, or more general.

Figures4

Click any figure to enlarge with its caption.

Equations106

c \sum p (ab c ∣ x y z) = c \sum p (ab c ∣ x y z^{'}), for all a, b \in A and all x, y, z, z^{'} \in X,

c \sum p (ab c ∣ x y z) = c \sum p (ab c ∣ x y z^{'}), for all a, b \in A and all x, y, z, z^{'} \in X,

b c \sum p (ab c ∣ x y z) = b c \sum p (ab c ∣ x y^{'} z^{'}), for all a \in A and all x, y, y^{'}, z, z^{'} \in X .

p (ab c ∣ x y z) = p (a ∣ x) p (b ∣ a; x y) p (c ∣ ab; x y z),

p (ab c ∣ x y z) = p (a ∣ x) p (b ∣ a; x y) p (c ∣ ab; x y z),

p (a ∣ x) = ω I_{a ∣ x} e, p (ab ∣ x y) = ω I_{a ∣ x} I_{b ∣ y} e, etc.

p (a ∣ x) = ω I_{a ∣ x} e, p (ab ∣ x y) = ω I_{a ∣ x} I_{b ∣ y} e, etc.

p (a ∣ x) \equiv p (a_{1} \dots a_{n} ∣ x_{1} \dots x_{n}) = ω I_{a_{1} ∣ x_{1}} \dots I_{a_{n} ∣ x_{n}} e \equiv ω I_{a ∣ x} e,

p (a ∣ x) \equiv p (a_{1} \dots a_{n} ∣ x_{1} \dots x_{n}) = ω I_{a_{1} ∣ x_{1}} \dots I_{a_{n} ∣ x_{n}} e \equiv ω I_{a ∣ x} e,

p (a ∣ x) = λ \sum q (λ) ω_{λ} I_{a ∣ x}^{λ} e .

p (a ∣ x) = λ \sum q (λ) ω_{λ} I_{a ∣ x}^{λ} e .

k \sum f_{k} \leq e and ω_{i} f_{j} = δ_{ij} for all i, j .

k \sum f_{k} \leq e and ω_{i} f_{j} = δ_{ij} for all i, j .

p (a ∣ x) = s_{0}, \dots, s_{n} \in C \sum r (s_{0}) q (a_{1}, s_{1} ∣ s_{0}, x_{1}) \dots q (a_{n}, s_{n} ∣ s_{n - 1}, x_{n}) .

p (a ∣ x) = s_{0}, \dots, s_{n} \in C \sum r (s_{0}) q (a_{1}, s_{1} ∣ s_{0}, x_{1}) \dots q (a_{n}, s_{n} ∣ s_{n - 1}, x_{n}) .

p (a ∣ x) = s_{0}, \dots, s_{n} \in C, λ \sum p (λ) r^{λ} (s_{0}) q^{λ} (a_{1}, s_{1} ∣ s_{0}, x_{1}) \dots q^{λ} (a_{n}, s_{n} ∣ s_{n - 1}, x_{n}) .

p (a ∣ x) = s_{0}, \dots, s_{n} \in C, λ \sum p (λ) r^{λ} (s_{0}) q^{λ} (a_{1}, s_{1} ∣ s_{0}, x_{1}) \dots q^{λ} (a_{n}, s_{n} ∣ s_{n - 1}, x_{n}) .

p (a ∣ x) = π^{†} T (a_{1} ∣ x_{1}) \dots T (a_{n} ∣ x_{n}) η \equiv π^{†} T (a ∣ x) η,

p (a ∣ x) = π^{†} T (a_{1} ∣ x_{1}) \dots T (a_{n} ∣ x_{n}) η \equiv π^{†} T (a ∣ x) η,

{\mathcal{S}}=\set{{\boldsymbol{\varpi}}\in{\mathds{R}}^{d}}{{\boldsymbol{\varpi}}\geq 0\text{ and }{\boldsymbol{\varpi}}^{\dagger}{\boldsymbol{\eta}}=1}.

{\mathcal{S}}=\set{{\boldsymbol{\varpi}}\in{\mathds{R}}^{d}}{{\boldsymbol{\varpi}}\geq 0\text{ and }{\boldsymbol{\varpi}}^{\dagger}{\boldsymbol{\eta}}=1}.

p (a ∣ x) = λ \sum p (λ) q_{t_{1}}^{λ} (a_{1} ∣ x_{1}) \dots q_{t_{n}}^{λ} (a_{n} ∣ x_{n}),

p (a ∣ x) = λ \sum p (λ) q_{t_{1}}^{λ} (a_{1} ∣ x_{1}) \dots q_{t_{n}}^{λ} (a_{n} ∣ x_{n}),

p (a ∣ x) = tr [ρ I_{a ∣ x} (\openone)] .

p (a ∣ x) = tr [ρ I_{a ∣ x} (\openone)] .

S = {X \mapsto tr (ρ X)} ρ \geq 0 and tr (ρ) = 1 .

S = {X \mapsto tr (ρ X)} ρ \geq 0 and tr (ρ) = 1 .

S = p (01∣00) + p (10∣10) + p (10∣11) .

S = p (01∣00) + p (10∣10) + p (10∣11) .

p (ab ∣ x y) = λ \sum p (λ) ω^{λ} I_{a ∣ x}^{λ} f_{b ∣ y}^{λ},

p (ab ∣ x y) = λ \sum p (λ) ω^{λ} I_{a ∣ x}^{λ} f_{b ∣ y}^{λ},

S = p (0∣0) + p (1∣1) + p (10∣10) - p (00∣00) - p (11∣11) = p (0∣0) [1 - p (0∣0; 00)] + p (1∣1) [1 + p (0∣1; 10) - p (1∣1; 11)] = ω (f_{0∣0}) [1 - σ_{0∣0} (f_{0∣0})] + ω (f_{1∣1}) [1 + σ_{1∣1} (f_{0∣0} - f_{1∣1})],

S = p (0∣0) + p (1∣1) + p (10∣10) - p (00∣00) - p (11∣11) = p (0∣0) [1 - p (0∣0; 00)] + p (1∣1) [1 + p (0∣1; 10) - p (1∣1; 11)] = ω (f_{0∣0}) [1 - σ_{0∣0} (f_{0∣0})] + ω (f_{1∣1}) [1 + σ_{1∣1} (f_{0∣0} - f_{1∣1})],

S = a_{0} [1 - s_{0} a_{0} - (1 - s_{0}) b_{0}] + a_{1} [1 + s_{1} (a_{0} - a_{1}) + (1 - s_{1}) (b_{0} - b_{1})] .

S = a_{0} [1 - s_{0} a_{0} - (1 - s_{0}) b_{0}] + a_{1} [1 + s_{1} (a_{0} - a_{1}) + (1 - s_{1}) (b_{0} - b_{1})] .

Ω_{cbit} = \frac{9}{4} .

Ω_{cbit} = \frac{9}{4} .

T (0∣0) = (\frac{1}{2} 1 00), T (1∣0) = (\frac{1}{2} 0 00), T (1∣1) = (0010) = T (0∣1)^{†} .

T (0∣0) = (\frac{1}{2} 1 00), T (1∣0) = (\frac{1}{2} 0 00), T (1∣1) = (0010) = T (0∣1)^{†} .

{S}=\braket{0}{E_{0}{0}|0}{\left missing}[1-\operatorname{tr}[\sigma_{0}E_{0|0}]\right]+\braket{0}{E_{1}{1}|0}{\left missing}[1+\operatorname{tr}[\sigma_{1}(E_{0|0}-E_{1|1})]\right],

{S}=\braket{0}{E_{0}{0}|0}{\left missing}[1-\operatorname{tr}[\sigma_{0}E_{0|0}]\right]+\braket{0}{E_{1}{1}|0}{\left missing}[1+\operatorname{tr}[\sigma_{1}(E_{0|0}-E_{1|1})]\right],

\Omega_{\mathrm{qubit}}=\max_{\stackrel{{\scriptstyle\ket{\psi_{0}},\ket{\psi_{1}}}}{{E_{0|0},E_{1|1}}}}\left[\braket{0}{E_{0}{0}|0}{\left missing}(1-\braket{\psi_{0}}{E_{0}{0}|\psi_{0}}{})+\braket{0}{E_{1}{1}|0}{\left missing}(1+\braket{\psi_{1}}{E_{0}{0}-E_{1|1}|\psi_{1}}{})\right],

\Omega_{\mathrm{qubit}}=\max_{\stackrel{{\scriptstyle\ket{\psi_{0}},\ket{\psi_{1}}}}{{E_{0|0},E_{1|1}}}}\left[\braket{0}{E_{0}{0}|0}{\left missing}(1-\braket{\psi_{0}}{E_{0}{0}|\psi_{0}}{})+\braket{0}{E_{1}{1}|0}{\left missing}(1+\braket{\psi_{1}}{E_{0}{0}-E_{1|1}|\psi_{1}}{})\right],

Ω_{qubit}^{feas} \leq Ω_{qubit} \leq Ω_{qubit}^{Lass} .

Ω_{qubit}^{feas} \leq Ω_{qubit} \leq Ω_{qubit}^{Lass} .

Ω_{qubit}^{feas} \approx Ω_{qubit}^{Lass} \approx 2.35570,

Ω_{qubit}^{feas} \approx Ω_{qubit}^{Lass} \approx 2.35570,

∣ ψ_{0} ⟩ \approx 0.408 ∣ 0 ⟩ - 0.913 ∣ 1 ⟩, ∣ ψ_{1} ⟩ \approx 0.640 ∣ 0 ⟩ + 0.768 ∣ 1 ⟩,

∣ ψ_{0} ⟩ \approx 0.408 ∣ 0 ⟩ - 0.913 ∣ 1 ⟩, ∣ ψ_{1} ⟩ \approx 0.640 ∣ 0 ⟩ + 0.768 ∣ 1 ⟩,

E_{0∣0} = \openone - ∣ ψ_{0} ⟩ ⟨ ψ_{0} ∣, and E_{1∣1} = ∣ ϕ ⟩ ⟨ ϕ ∣, where ∣ ϕ ⟩ \approx 0.971 ∣ 0 ⟩ - 0.238 ∣ 1 ⟩ .

E_{0∣0} = \openone - ∣ ψ_{0} ⟩ ⟨ ψ_{0} ∣, and E_{1∣1} = ∣ ϕ ⟩ ⟨ ϕ ∣, where ∣ ϕ ⟩ \approx 0.971 ∣ 0 ⟩ - 0.238 ∣ 1 ⟩ .

S = (t_{0} + w^{†} f_{0}) [1 - t_{0} - w_{0}^{†} f_{0}] + (t_{1} + w^{†} f_{1}) [1 + t_{0} - t_{1} + w_{1}^{†} (f_{0} - f_{1})] .

S = (t_{0} + w^{†} f_{0}) [1 - t_{0} - w_{0}^{†} f_{0}] + (t_{1} + w^{†} f_{1}) [1 + t_{0} - t_{1} + w_{1}^{†} (f_{0} - f_{1})] .

Ω_{dnc} = w, t_{0}, f_{0} t_{1}, f_{1} max {(t_{0} + w^{†} f_{0}) [1 - t_{0} + ∣ f_{0} ∣] + (t_{1} + w^{†} f_{1}) [1 + t_{0} - t_{1} + ∣ f_{0} - f_{1} ∣]},

Ω_{dnc} = w, t_{0}, f_{0} t_{1}, f_{1} max {(t_{0} + w^{†} f_{0}) [1 - t_{0} + ∣ f_{0} ∣] + (t_{1} + w^{†} f_{1}) [1 + t_{0} - t_{1} + ∣ f_{0} - f_{1} ∣]},

Ω_{hbit} \approx 2.35570,

Ω_{hbit} \approx 2.35570,

w^{†} = (1, - 1), w_{0}^{†} = (- 1, 1), w_{1}^{†} = (1, 1)

w^{†} = (1, - 1), w_{0}^{†} = (- 1, 1), w_{1}^{†} = (1, 1)

f_{0} = e_{1}, f_{1} = - e_{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Memory cost of temporal correlations

Costantino Budroni

[email protected]

Institute for Quantum Optics and Quantum Information (IQOQI), Austrian Academy of Sciences, Boltzmanngasse 3, 1090 Vienna, Austria

Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090 Vienna, Austria

Gabriel Fagundes

[email protected]

Departamento de Física, Universidade Federal de Minas Gerais UFMG, P.O. Box 702, 30123–970, Belo Horizonte, MG, Brazil

Matthias Kleinmann

[email protected]

Naturwissenschaftlich–Technische Fakultät, Universität Siegen, Walter-Flex-Straße 3, 57068 Siegen, Germany

Abstract

A possible notion of nonclassicality for single systems can be defined on the basis of the notion of memory cost of classically simulating probabilities observed in a temporal sequence of measurements. We further explore this idea in a theory-independent framework, namely, from the perspective of general probability theories (GPTs), which includes classical and quantum theory as special examples. Under the assumption that each system has a finite memory capacity, identified with the maximal number of states perfectly distinguishable with a single measurement, we investigate what are the temporal correlations achievable with different theories, namely, classical, quantum, and GPTs beyond quantum mechanics. Already for the simplest nontrivial scenario, we derive inequalities able to distinguish temporal correlations where the underlying system is classical, quantum, or more general.

I Introduction

Given a single quantum system, in what sense can we say that it has some nonclassical properties? The most celebrated phenomena where quantum systems depart from their classical counterpart involve notions such as entanglement Horodecki et al. (2009); Gühne and Tóth (2009) and nonlocality Bell (1964); Brunner et al. (2014), which can be defined only in terms of multipartite systems. What if we are able to perform experiments only on a single, indivisible, system? Can we still say that the observed statistics has some “nonclassical properties”? Some notion of nonclassicality have been proposed for single systems, such as contextuality Kochen and Specker (1967) and nonmacrorealism Leggett and Garg (1985); Emary et al. (2014). One may argue that such notions are limited to specific measurement procedures and hence are not fully satisfactory. Contextuality restricts the set of possible operations to compatible measurements, which in many cases need to be (approximately) projective or at least satisfy some analogous notion of repeatability and nondisturbance Gühne et al. (2010); Kujala et al. (2015), in order to avoid the so-called “compatibility loophole” Gühne et al. (2010) or other similar classical explanations. Macrorealism has similar strong restrictions on the set of allowed measurements, namely, they must be noninvasive to avoid the clumsiness loophole or other forms of classical interpretation of the results Wilde and Mizel (2012).

A strong motivation for developing such a notion of nonclassicality for single systems also arises from quantum information theory. Notions such as entanglement and nonlocality have been proved to play a role in quantum information tasks related to communication, such as, e.g., device-independent quantum key distribution Acín et al. (2007). That such notions should play a role also for tasks involving only single systems, such as, e.g., quantum computation, is less evident. Several recent results connected quantum contextuality with models of quantum computation such as, e.g., quantum computation via magic state injection or measurement based quantum computation Howard et al. (2014); Delfosse et al. (2015); Raussendorf (2013); Bermejo-Vega et al. (2017); Raussendorf et al. (2017); Abramsky et al. (2017); Oestereich and Galvão (2017). However, a natural question arises of whether this connection is fundamental or just related to the particular model used for quantum computation Markiewicz et al. (2014). If one moves from compatible projective measurements to general instruments, it is no longer clear whether the notion of quantum contextuality make sense at all, due to the compatibility loophole mentioned above Gühne et al. (2010).

In this paper, we go beyond such notions and introduce a notion of nonclassicality for the measurement statistics of a single system which is not restricted to specific measurement operations. The main tool of this investigation is the notion of memory cost of simulating temporal correlations. By temporal correlations we mean the observed statistics arising from sequences of measurements on a single system and memory roughly refers the amount of classical information that can be stored in the physical system.

The notion of memory cost has been explored in connection with classical simulations of quantum contextuality Kleinmann et al. (2011); Fagundes and Kleinmann (2017), quantum simulation of classical stochastic processes Garner et al. (2017) memory asymmetry between prediction and retrodiction Thompson et al. (2018), and in relation with the accuracy of classical and quantum clocks Woods et al. (2018). A related notion, i.e., that of communication cost, has been explored in relation to both Bell nonlocality Pironio (2003); Montina and Wolf (2016) and temporal correlations Brierley et al. (2015); Żukowski (2014). Similar notions have been explored also in the prepare-and-measure scenario Gallego et al. (2010); Brunner et al. (2013); Dall’Arno et al. (2017a, b); Rosset et al. (2018) and in connection with quantum information tasks such as random access codes Bowles et al. (2015); Tavakoli et al. (2016); Aguilar et al. (2018); Miklin et al. (2019).

In our approach, we go beyond the prepare-and-measure scenario by exploring arbitrary long sequences of measurements and we remove any restriction on the type of measurement by considering arbitrary quantum instruments. Our analysis is not only restricted to the differences between classical and quantum theory, but is extended to general probabilistic theories (GPTs) Ludwig (1985); Mittelstaedt (1998); Chiribella et al. (2010); Acín et al. (2010), which embrace also the former theories. In particular, we derive inequalities on the observed probabilities that are able to discriminate between classical, quantum, and genuine GPT correlations. Moreover, as a further development of the ideas presented in Refs. Kleinmann et al. (2011); Fagundes and Kleinmann (2017), we show that in the framework of finite-state machines it is impossible to simulate contextual correlations on a qubit system, for a fixed initial state and arbitrary instruments.

The paper is organized as follows. In Sec. II, we will introduce the basic notions and tools necessary for our analysis, namely, temporal correlations and the arrow of time polytope. In Sec. III, we will introduce finite-state machines in GPTs, in particular, also in classical and quantum theory. In Sec. IV, we will discuss the existence of nontrivial temporal bounds for such theories and the impossibility of simulating contextual correlations on a qubit. Finally, we present the conclusions and an outlook of the paper.

II Temporal correlations

We consider a box that accepts certain inputs from an input alphabet ${\mathcal{X}}$ and produces outputs from an output alphabet ${\mathcal{A}}$ . The box is operated in a sequential fashion, see Fig. 1(a), such that, for instance, it first receives an input labeled by $x\in{\mathcal{X}}$ yielding an output labeled by $a\in{\mathcal{A}}$ , subsequently it receives $y$ yielding $b$ , and finally it receives $z$ yielding $c$ . Prior to this sequence the box is initialized, such that its behavior is independent of anything except the input sequence $xyz$ . Consequently, for a fixed input sequence $xyz\in{\mathcal{X}}^{3}$ , the admissible output sequences $abc\in{\mathcal{A}}^{3}$ are governed by a probability distribution. If we now consider all possible inputs, we obtain the correlations $p(abc|xyz)$ . Due to the time ordering of the inputs and outputs, these correlations must satisfy the arrow of time constraints Clemente and Kofler (2016),

[TABLE]

These constraints encode the fact that a future choice of an input, e.g., $z$ or $z^{\prime}$ in Eq. (1), must not influence previous outputs of the box, e.g., $a$ or $b$ . This is in analogy to the nonsignaling conditions in the usual Bell scenario Popescu and Rohrlich (1994). The arrow of time constraints come solely from causality and hence, they must be satisfied not only in classical and quantum theory, but in any GPT.

We can represent the correlations $p(abc|xyz)$ as a vector with coordinates labeled by the possible sequences $abc$ and $xyz$ . Due to the linearity of the arrow of time constraints, the set of correlations satisfying those forms a polytope. Its extremal points have been recently characterized Abbott et al. (2016); Hoffmann (2016); Hoffmann et al. (2018). It is instructive to briefly sketch the central steps for the simple case of sequences of length three. All correlations in the corresponding polytope can be decomposed as

[TABLE]

since the marginals on the right hand side are well defined (for the pathological cases where $p(ab|xy)=0$ we define the right hand side to be zero). Vice versa, taking valid probability distributions $p(a|x)$ , $p(b|a;xy)$ , $p(c|ab;xyz)$ over $a,b,c$ , respectively, one always obtains an element of the polytope. Its extremal points are obtained by deterministic strategies, i.e., where each of the probability distributions on the right hand side of Eq. (3) consists only of probabilities [math] or $1$ . It easily follows that classical and quantum models can reach extremal points if enough memory is available. In more precise terms, each deterministic strategy can be reached if the box internally keeps a record of all previous inputs and outputs. Storing this record then requires the box to have memory. Of course, the notion of memory needs clarification, in particular if the box is described using quantum theory or a GPT, for details see Sec. III. Clearly, storing the full record of previous inputs and outputs is not necessarily memory optimal and gives rise to the question: What is the minimal number of states necessary to obtain certain correlations? How does such a number depend on the specific theory we use to describe the internals of the box?

An important element, in order to be able to speak about the memory cost of temporal correlations, is the requirement that all time-dependent information used to produce the outputs must be stored within the physical system used to implement the box. This implies that the physical operations performed to produce an output must be time-independent, e.g., the experimenter is not allowed to look at the wall clock and decide to implement in a different way the operation associated with a certain input $x$ , as this will result in an additional source of memory, i.e., the clock keeping track of time. It is interesting to notice that the case where such time-dependent are admissible is equivalent to the case of quantum communication scenarios such as quantum random access codes or the scenario described by Brierley et al. Brierley et al. (2015). In fact, the latter scenario can be modeled as a network with ordered nodes, where a single physical system is transmitted through the nodes, and at each time step one of the nodes receives the system, performs a local operation, and transmit the system to the subsequent node. Since for each node it is known in advance in which part of the sequence it is situated, its local operations can be adapted to maximize a certain figure of merit defined in terms of probabilities of outcomes. This scenario covers the notion of “communication cost” and it must be distinguished from the notion of “memory cost” that is considered here. Moreover, even though in the memory cost scenario we are not allowed to change the operations throughout the sequence, it still makes sense to use classical randomness at the beginning of a sequence: at each experimental run, the experimenter can flip a coin and decide to perform the whole sequence of with one box or another. The resulting correlations will be a convex combination of the correlations obtained from either box. A graphical representation of the above ideas is presented in Fig. 1. These intuitive notions are made more rigorous in the next section.

III Finite-state machines

In this section, we formally define the classical, quantum, and GPT models for the box used in the previous section. In this model we assume that the box is implemented as a machine which acts on an internal state. Upon receiving an input $x$ , the box operates on the internal state and produces the output $a$ . The internal state is the specific model of the memory from the previous section. More precisely, we use the finite number of perfectly distinguishable states as a measure for the memory and for this reason we call this model a finite-state machine.

In a first step we need to describe the internal state $\omega$ and the operations ${\mathcal{I}}_{a|x}$ of the machine. We choose ordered vector spaces to describe the machine, which is an appropriate framework for a wide range of GPTs. In Appendix A we give a brief summary of this mathematical formalism. In brief, a GPT is then described by a real vector space $V$ with partial order “ $\leq$ ” and an order unit $e\in V$ . In quantum theory $V$ would be the set of Hermitian operators, $A\leq B$ would correspond to $B-A$ being positive semidefinite, and $e$ to the identity operator. Measurement outcomes are represented by effects $f\in V$ with $0\leq f\leq e$ and a measurement ${\mathsf{M}}_{x}$ is represented by a collection of effects ${\mathsf{M}}_{x}=(f_{a|x})_{a}$ with $\sum_{a}f_{a|x}=e$ . The set of states ${\mathcal{S}}$ is a subset of the dual space of $V$ such that the probability of outcome $a$ in the measurement ${\mathsf{M}}_{x}$ is given by $p(a|x)=\omega f_{a|x}$ . Therefore $\omega e=1$ and $\omega f\geq 0$ for all $f\in V$ with $f\geq 0$ . The operations ${\mathcal{I}}_{a|x}$ represent a specific way to implement a measurement, taking into account the change of the internal state $\omega$ . More precisely, the linear map ${\mathcal{I}}_{a|x}\colon V\rightarrow V$ is such that $f_{a|x}={\mathcal{I}}_{a|x}e$ is the effect describing the output $a$ . In addition the positivity condition ${\mathcal{I}}_{a|x}f\geq 0$ for any $f\in V$ with $f\geq 0$ needs to be satisfied and further restrictions to ${\mathcal{I}}_{a|x}$ may apply depending on the specific GPT. If we group together the transformations ${\mathcal{I}}_{x}=({\mathcal{I}}_{a|x})_{a\in{\mathcal{A}}}$ for a fixed input $x$ , then ${\mathcal{I}}_{x}$ is called an instrument. If we ignore the outcome $a$ , then the instrument maps states to states, in the sense that $\omega\sum_{a}{\mathcal{I}}_{a|x}\in{\mathcal{S}}$ for any state $\omega$ .

Given the initial internal state $\omega$ of the finite-state machine and the instrument ${\mathcal{I}}_{x}=({\mathcal{I}}_{a|x})_{a}$ , the probabilities associated with a sequence of measurement are given by

[TABLE]

Note, that we write the transformations in the Heisenberg picture, so that the time ordering proceeds from the left to the right. For a general sequence of inputs $x_{1}x_{2}\cdots x_{n}=\vec{x}$ and outputs $a_{1}a_{2}\ldots a_{n}=\vec{a}$ we write

[TABLE]

We exemplify in the next sections how this expression is specialized to the classical and quantum case.

As we discussed previously, we exclude any external source of memory, such as a clock keeping track of time. This is formalized by the fact that all instruments solely depend on the input and in particular by the fact that all transformations are time-independent. In general, for a fixed GPT this requirement makes the set of achievable correlations nonconvex. Nevertheless, we can recover convexity by allowing the use of convex mixtures as follows. Before starting the experiment we use a random variable $\lambda$ , distributed according to some probability distribution $q(\lambda)$ , to decide which finite-state machine to use subsequently. Since the machine is characterized by the initial state $\omega_{\lambda}$ and the instruments ${\mathcal{I}}^{\lambda}_{x}$ , this yields the correlations

[TABLE]

The above procedure allows us to generate all correlations from the convex hull of correlations obtainable from a family of finite-state machines parametrized by $\lambda$ .

Finally, we define the memory of the system using the GPT notion of capacity (cf. Ref. Masanes and Müller (2011)), i.e., the size of the maximal set of perfectly distinguishable states. More precisely, we say that a GPT defines a $d$ -state machine if $d$ is the maximal integer such that there exists a collection of $d$ states $(\omega_{k})_{k}$ and $d$ effects $(f_{k})_{k}$ such that

[TABLE]

Namely, all effects are part of the same measurement, which is able to perfectly (i.e., probability one) discriminate among the states. This notion of capacity corresponds to the dimension of the Hilbert space in quantum mechanics and with the number of extremal points of the state simplex in classical probability theory (see, e.g., Ref. Masanes and Müller (2011)).

It is instructive to discuss in more detail the classical and quantum case, which may be more familiar to the reader. We subsequently introduce a particular class of capacity-2 GPTs, the dichotomic norm cones Kleinmann (2014).

III.1 Classical finite-state machines

A classical finite-state machine Paz (2003) is described by its internal rules for state transitions and output probabilities. Given the classical state ${\mathcal{C}}=\set{1,2,\dotsc,d}$ , the observed probability distribution $p(\vec{a}|\vec{x})$ for an input sequence $\vec{x}$ of length $n$ can be written as

[TABLE]

Here, $r(s_{0})$ describes the probability of preparing the initial state $s_{0}$ of the machine111Without loss of generality, we could assume a fixed pure initial state $s_{0}$ , since we allow for convex mixtures of different machines. Nevertheless, we keep the notation with an initial distribution $r(s_{0})$ over all pure states $\mathcal{C}$ , i.e., a mixed state, to keep the analogy with the standard notation for GPT states ( $\omega$ ) and quantum states ( $\rho$ ). and $q(a,s^{\prime}|s,x)$ describes the probability that the machine yields the output $a$ and transition to the state $s^{\prime}$ , given that the internal state is $s$ and the input is $x$ . As in Eq. (6), those machines can depend on a random variable $\lambda$ generated at the beginning of each sequence, i.e.,

[TABLE]

For clarity reasons, we use only Eq. (8) in the following. The correlations $p(\vec{a}|\vec{x})$ can be rewritten as

[TABLE]

where ${\boldsymbol{\eta}}=(1,1,\ldots,1)^{\dagger}$ is the $d$ -dimensional vector of ones, ${\boldsymbol{\pi}}$ is the vector representing the initial state, and $T(a|x)$ is the $d\times d$ transition matrix. Hence, $\pi_{s}=r(s)$ and $[T(a|x)]_{s,s^{\prime}}=q(a,s^{\prime}|s,x)$ . The rules for probabilities that constrain $q(a,s^{\prime}|s,x)$ translate to $[T(a|x)]_{s,s^{\prime}}\geq 0$ for all $s,s^{\prime},a,x$ , and $\sum_{a}[T(a|x){\boldsymbol{\eta}}]_{s}=1$ for all $s,x$ .

Translating the above in the languages of GPTs, we let $V={\mathds{R}}^{d}$ and set the order unit $e$ to ${\boldsymbol{\eta}}$ . The partial order is such that ${\boldsymbol{v}}\leq{\boldsymbol{w}}$ if $v_{s}\leq w_{s}$ for all $s$ . Then the set of states is given by by the canonical $(d-1)$ -dimensional simplex,

[TABLE]

In particular ${\boldsymbol{\pi}}$ is a state. Analogously, the transition matrix $T(a|x)$ corresponds to the instruments ${\mathcal{I}}_{a|x}$ , whereas the effects can be obtained as $f_{a|x}:=T(a|x){\boldsymbol{\eta}}$ . It can be easily seen that $d$ correspond exactly to the capacity defined according to Eq. (7).

III.1.1 Classical finite-state machines and Leggett-Garg’s macrorealist models

It is interesting at this point to briefly compare the model in Eq. (9) with the macrorealist model of Leggett and Garg Leggett and Garg (1985). A macrorealist model can be simply obtained by reducing the set of possible internal states to a single one, i.e., $d=1$ , and re-introducing the time-dependence of operations.

[TABLE]

where the dependency on $s_{0},\ldots,s_{n}$ becomes trivial and is then removed. We recall that macrorealist models are based on two assumptions: macrorealism per se, i.e., the existence of a classical probability, and noninvasive measurability, i.e., the assumption that the measurement has no effect on the subsequent evolution of the system. The finite-state machine model can be seen as arising from the macrorealist model via a relaxation of the assumption of a noninvasive measurement: the measurement can be invasive up to a certain amount quantified by the internal memory of the system, e.g., for a two state-machine the measurement can encode at most one bit of information in the system. Notice that, however, usually Leggett-Garg assumptions allow the operations to be time-dependent.

It is interesting to remark that similar ideas have been already employed in Leggett-Garg tests to tighten the clumsiness loophole. Under the assumption of a classical model with two internal states, Knee et al. Knee et al. (2016) were able to quantify the measurement invasivity via a control experiment, and consequently modify the classical bound for the Leggett-Garg inequality. In agreement with our argument above, the work of Knee et al. shows how the notion of finite memory can be used as a relaxation of the assumption of a noninvasive measurement.

III.2 Quantum finite-state machines

The quantum case is perhaps the most familiar to readers from quantum information. The probability distribution is obtained by sequences of generalized measurements ${\mathsf{M}}_{x}=(E_{a|x})_{a}$ on a single system described by a Hilbert space of fixed dimension $d$ . The outcomes of the measurement are described by positive semidefinite operators $E_{a|x}\geq 0$ with $\sum_{a}E_{a|x}=\openone$ .

In order to discuss sequential measurements, however, we need to know the post-measurement state, or, better, the transformation induced by the measurements. This information is provided by a quantum instrument ${\mathcal{I}}_{x}$ , defined as a collection of completely positive maps ${\mathcal{I}}_{x}=({\mathcal{I}}_{a|x})_{a}$ , from the space of linear operators into itself, that sum up to a unital map, i.e., $\sum_{a}{\mathcal{I}}_{a|x}(\openone)=\openone$ , corresponding to the rule of preservation of probability in the Heisenberg picture, see, e.g., Heinosaari and Ziman (2011). Each instrument defines a generalized measurement through the formula $E_{a|x}={\mathcal{I}}_{a|x}(\openone)$ . Similarly to the previous cases, we can shorten the notation by defining ${\mathcal{I}}_{\vec{a}|\vec{x}}:={\mathcal{I}}_{a_{1}|x_{1}}\circ\ldots\circ{\mathcal{I}}_{a_{n}|x_{n}},$ where $\circ$ denotes the composition of maps and write

[TABLE]

As mentioned before, quantum theory is a particular case of a GPT, where the vectors space $V$ is the set of Hermitian operators, the partial order is defined through positive semidefiniteness and the order unit $e$ is given by $\openone$ . The set of states is given by the density operators, identified by the Hilbert–Schmidt inner product with the elements of the dual space of $V$ ,

[TABLE]

Hence Eq. (13) and Eq. (5) are equivalent. It is then clear that the capacity of the system, defined as the number of perfectly distinguishable state Fritz (2010); Hoffmann et al. (2018) precisely corresponds to the dimension of the Hilbert space. It is important to remark that we need to consider the general formalism of quantum instruments, since if the measurement devices would merely act projectively, there would be nontrivial limitations on the achievable correlations that are valid for arbitrary dimensions Budroni et al. (2013); Budroni and Emary (2014).

III.3 GPT two-state machines

We already provided a definition of GPT finite-state machines at the beginning of Sec. III. In this section, we specialize this definition by considering a class GPTs where the effects belong to a dichotomic norm cone. These theories are a generalization of the classical bit (cbit) and quantum bit (qubit), in the sense that they have capacity two, i.e., they allow for a set of perfectly distinguishable states, in the sense of Eq. (7), of at most size two. We then specialize our discussion to the case of hyperbits (hbits) Pawłowski and Winter (2012) and generalized bits (gbits) Barrett (2007). The former are a generalization of the Bloch sphere to dimension higher than three, whereas the latter are the local part of a Popescu–Rohrlich box Popescu and Rohrlich (1994). We also provide a more detailed discussion of GPTs in Appendix A.

Consider the vector space $V:={\mathds{R}}\times{\mathds{R}}^{n}$ , and the partial order where $(t,{\boldsymbol{x}})\geq 0$ if $t\geq{\lvert{{\boldsymbol{x}}}\rvert}$ . Here, ${\lvert{{\boldsymbol{x}}}\rvert}$ is any norm in ${\mathds{R}}^{n}$ . We define the order unit $e:=(1,{\boldsymbol{0}})$ . This implies that effects are vectors $f=(t,{\boldsymbol{x}})$ such that ${\lvert{x}\rvert}\leq\min\set{t,1-t}$ . The states for a dichotomic norm cone are the maps $\omega\colon(t,{\boldsymbol{x}})\mapsto t+{\boldsymbol{w}}^{\dagger}{\boldsymbol{x}}$ with the condition ${\lvert{{\boldsymbol{w}}}\rvert}_{*}\leq 1$ , where ${{\lvert{{\boldsymbol{w}}}\rvert}_{*}:=\sup\set{{\boldsymbol{w}}^{\dagger}{\boldsymbol{y}}}{{\lvert{{\boldsymbol{y}}}\rvert}\leq 1}}$ is the dual norm of ${\lvert{\,\cdot\,}\rvert}$ . A peculiarity of this GPT is that it has exactly capacity two, independent of $n$ or the choice of the norm ${\lvert{\,\cdot\,}\rvert}$ . We provide a proof of this fact in Appendix C.

Depending on the norm chosen and on $n$ we have different GPTs. If we take ${\lvert{{\boldsymbol{x}}}\rvert}$ to be the Euclidean (or $\ell_{2}$ ) norm, i.e., ${\lvert{{\boldsymbol{x}}}\rvert}^{2}=\sum_{i}x_{i}^{2}$ , we obtain hbits, and specifically cbits for $n=1$ , qubits for $n=3$ and more general hbits for $n>3$ . If we take $n=2$ and the Manhattan (or $\ell_{1}$ ) norm, i.e., ${\lvert{{\boldsymbol{x}}}\rvert}=\sum_{i}|x_{i}|$ , we obtain a gbit. For the case of the Euclidean norm, the dual norm is also the Euclidean norm itself, whereas the dual of the Manhattan norm is the supremum (or $\ell_{\infty}$ ) norm, i.e., ${\lvert{{\boldsymbol{w}}}\rvert}_{*}=\max_{i}|w_{i}|$ .

IV Bounds on temporal correlations

In this section, we consider the simplest nontrivial scenario, a sequence of two measurements, with inputs $x,y$ and outputs $a,b$ , with $a,b,x,y=0,1$ . We are interested in bounds on the sum of correlations

[TABLE]

Similar expressions have been considered in Ref. Hoffmann (2016); Hoffmann et al. (2018); Spee et al. (2018). Clearly, the trivial bound ${S}\leq 3$ holds. For hbits the value ${S}=3$ cannot be reached and therefore there must exist a nontrivial bound ${S}\leq\Omega_{\mathrm{hbit},n}$ for any dimension $n$ of the hbit, in particular for the cbit ( $n=1$ ) and the qubit ( $n=3$ ). A simple analytical proof of $\Omega_{\mathrm{hbit},n}<3$ is presented in Appendix B.

IV.1 Measure-and-prepare strategies

The analysis of the case of sequences of length two can be greatly simplified using measure-and-prepare instruments. These are instruments of the form ${\mathcal{T}}_{x}=(f_{a|x}\sigma_{a|x})_{a}$ , where ${\mathsf{M}}_{x}=(f_{a|x})_{a}$ is a measurement and $(\sigma_{a|x})_{a}$ is a collection of states. Hence ${\mathcal{T}}_{x}$ can be implemented by first measuring ${\mathsf{M}}_{x}$ and then, depending on the outcome $a$ , preparing the state $\sigma_{a|x}$ .

Now, for a sequence of length two, the correlations are given by

[TABLE]

where $\omega^{\lambda}$ is given by the initialization procedure of the individual finite-state machines participating in the mixture of machines. Clearly, the extremal values ${S}$ can be achieved by a single finite-state machine and hence in the following we will omit the index $\lambda$ and the summation of $\lambda$ .

The instruments ${\mathcal{I}}_{x}$ can be replaced by measure-and-prepare instruments, by letting $f_{a|x}={\mathcal{I}}_{a|x}e$ and $\sigma_{a|x}=\omega{\mathcal{I}}_{a|x}/\omega(f_{a|x})$ if the denominator is nonzero, or $\sigma_{a|x}=\omega$ . Then $p(ab|xy)=\omega f_{a|x}\sigma_{a|x}f_{b|y}$ . Hence we can equivalently replace ${\mathcal{I}}_{a|x}$ by the prepare-and-measure strategy $\mathcal{T}_{a|x}=f_{a|x}\sigma_{a|x}$ . Using this simplification, we obtain

[TABLE]

where we used the notation $p(b|a;xy)$ for the probabilities conditioned on previous outputs.

IV.2 Analytical and numerical bounds

Since ${S}=3$ cannot be reached with hbits, there must be a finite gap between the actual bound for cbits, qubits, and hbits with a Bloch sphere of fixed dimension. In fact, the sets of states and effects are compact, and the expression ${S}$ can be written as a continuous function from the set of states and effects into the interval $[0,3]$ , so its image must be compact. In this section, we explore in more detail the bounds for cbits, qubits, and hbits via numerical methods.

IV.2.1 Classical bit

For the cbit case, we use the representation from Sec. III.1, specifically, $\omega$ is represented by $(1,0)$ , $\sigma_{i|i}$ by $(s_{i},1-s_{i})$ , and $f_{i|i}$ by $(a_{i},b_{i})^{\dagger}$ , where $s_{i},a_{i},b_{i}\in[0,1]$ . Then Eq. (17) reads

[TABLE]

Only $a_{0}$ and $a_{1}$ appear nonlinearly in this expression. Therefore, the maximum of ${S}$ is attained when all remaining parameters are either [math] or $1$ . This leaves us with a two-dimensional, at most quadratic optimization, which can be performed at once. For the maximal value $\Omega_{\mathrm{cbit}}$ of ${S}$ using classical bits we then obtain

[TABLE]

This maximum occurs at a unique point, where $s_{1}=b_{1}=0$ , $b_{0}=s_{0}=a_{1}=1$ , and $a_{0}=\frac{1}{2}$ . Hence, an optimal machine is given by the initial state ${\boldsymbol{\pi}}^{\dagger}=(1,0)$ and the transition matrices

[TABLE]

Note, that while the solution for the chosen parametrization is unique, the transition matrices are not unique.

IV.2.2 Quantum bit

For the qubit case, we can proceed similarly to Ref. Hoffmann et al. (2018). First we note that in Eq. (17), the initial state $\omega$ can be replaced by a pure state, so that $\omega\colon X\mapsto\braket{0}{X}{0}$ . The expression ${S}$ can then be written as

[TABLE]

where $0\leq E_{i|i}\leq\openone$ are effects and $\sigma_{0}$ and $\sigma_{1}$ are density operators. Since the latter occur only linearly in ${S}$ , we can substitute them with pure states $\ket{\psi_{0}}$ and $\ket{\psi_{1}}$ , respectively. The maximum of ${S}$ for qubits is hence given by

[TABLE]

By parametrizing $E_{0|0}$ , $E_{1|1}$ , $\ket{\psi_{0}},\ket{\psi_{1}}$ with real parameters, one can write the expression in Eq. (22) as fourth degree polynomial. This can be further simplified, by taking $E_{0|0}$ $E_{1|1}$ , $\ket{\psi_{0}},\ket{\psi_{1}}$ as real expression, which lowers the number of parameters to ten.222Since the upper bound is calculated by polynomial optimization methods, it is more convenient to keep the expression and constraints in polynomial form, rather than minimizing the number of variables. For example, a parametrization of a pure state as ${\cos\theta\ket{0}+\sin\theta\ket{1}}$ removes one variable and one constraint, but it is no long a polynomial in the parameters. The reduction to the real part of a qubit does not affect the optimality as we show in the next section, see Eq. (28).

It is always possible to obtain a lower bound $\Omega^{\mathrm{feas}}_{\mathrm{qubit}}$ on $\Omega_{\mathrm{qubit}}$ by guessing appropriate values for the free parameters. An upper bound, $\Omega^{\mathrm{Lass}}_{\mathrm{qubit}}$ , can be obtained via Lasserre’s method Lasserre (2001) of polynomial optimization based on moment matrices and semidefinite programming (SDP) Vandenberghe and Boyd (1996), which provides analytical upper bounds up to the numerical precision. That is,

[TABLE]

With the simplifications used above, the upper and lower bounds coincide up to the numerical precision of $10^{-5}$ . We have,

[TABLE]

showing a gap between the cbit and qubit case. A feasible solution is given by the post-measurement states and effects,

[TABLE]

and the effects

[TABLE]

IV.2.3 Hyperbit

For the case of hbits, and also the more general dichotomic norm cones, we use the parametrization $\omega\colon(t,{\boldsymbol{x}})\mapsto t+{\boldsymbol{w}}^{\dagger}{\boldsymbol{x}}$ and $\sigma_{i|i}\colon(t,{\boldsymbol{x}})\mapsto t+{\boldsymbol{w}}_{i}^{\dagger}{\boldsymbol{x}}$ for the states and $f_{i|i}=(t_{i},{\boldsymbol{f}}_{i})$ for the effects. Then Eq. (17) reads

[TABLE]

When maximizing ${S}$ , we can eliminate the maximization over ${\boldsymbol{w}}_{0}$ and ${\boldsymbol{w}}_{1}$ , by choosing appropriate vectors with ${\lvert{{\boldsymbol{w}}_{i}}\rvert}_{*}=1$ such that ${\boldsymbol{w}}_{0}^{\dagger}{\boldsymbol{f}}_{0}={\lvert{{\boldsymbol{f}}_{0}}\rvert}$ and ${\boldsymbol{w}}_{1}^{\dagger}({\boldsymbol{f}}_{0}-{\boldsymbol{f}}_{1})={\lvert{{\boldsymbol{f}}_{0}-{\boldsymbol{f}}_{1}}\rvert}$ . The maximal value of ${S}$ for a given dichotomic norm cone is hence

[TABLE]

where the constraints of the optimization are ${\lvert{{\boldsymbol{w}}}\rvert}_{*}\leq 1$ and ${\lvert{{\boldsymbol{f}}_{i}}\rvert}\leq\min\{t_{i},1-t_{i}\}$ . For the case of hbits, both ${\lvert{\cdot}\rvert}_{*}$ and ${\lvert{\cdot}\rvert}$ correspond to the $\ell_{2}$ norm , hence the conditions are invariant under orthogonal transformations as it is the case for the function to be maximized, which depends only on the norm of ${\boldsymbol{f}}_{i}$ and the scalar products between ${\boldsymbol{w}}$ and ${\boldsymbol{f}}_{i}$ . Since the only contribution for ${\boldsymbol{w}}$ comes from the component in the span of ${\boldsymbol{f}}_{0},{\boldsymbol{f}}_{1}$ , the problem reduces to a two-dimensional one. This is equivalent to the qubit case with the Bloch ball restricted to the $xz$ -plane, both for states and effects. This implies that the bound for hbits coincide with the bound for qubits. We thus have

[TABLE]

as in Eq. (24).

IV.2.4 Generalized bit

The case of gbits differs from the previous one because we can actually reach ${S}=3$ already for a two-state machine, namely the dichotomic norm cone with $n=2$ and the $\ell_{1}$ norm. This model corresponds to the local part of a Popescu–Rohrlich box Popescu and Rohrlich (1994); Barrett (2007). The space of effects is a polytope with extremal effects given by the extremal point of the two-dimensional $\ell_{1}$ norm, i.e., $a_{\pm i}=\frac{1}{2}(1,\pm{\boldsymbol{e}}_{i})$ , with ${\boldsymbol{e}}_{i}$ the canonical vectors in ${\mathds{R}}^{2}$ . Then, the states are the $\omega=(1,{\boldsymbol{w}})$ with ${\boldsymbol{w}}$ in the square $[-1,1]\times[-1,1]$ , i.e., the unit ball with respect to the $\ell_{\infty}$ norm. The choices

[TABLE]

and

[TABLE]

yield, according to Eq. (28), the algebraic maximum for ${S}$ , i.e., ${S}=3$ . We thus have

[TABLE]

for gbits and hence also for the set of all dichotomic norm cones with the same norm and arbitrary $n$ .

IV.3 Impossibility of simulating contextual correlations with general

instruments on a qubit

In this section, we investigate whether qubit machines are able to simulate some contextual correlations that arise in higher dimensional quantum systems. In Ref. Kleinmann et al. (2011) it was proved that in order to simulate all deterministic predictions associated with the observables of the Peres–Mermin square Peres (1990); Mermin (1990), a classical machine with at least $4$ states is necessary. This result was obtained in the framework of tests of contextuality involving sequential measurements Gühne et al. (2010), in which the relevant compatibility notion is given by the nondisturbance among compatible measurements and repeatability of outcomes, e.g., if ${\mathsf{M}}_{x}$ and ${\mathsf{M}}_{y}$ are compatible measurements in the measurement sequence ${\mathsf{M}}_{x}{\mathsf{M}}_{y}{\mathsf{M}}_{x}$ , the outcome for the first measurement of ${\mathsf{M}}_{x}$ will be repeated in the second measurement of ${\mathsf{M}}_{x}$ .

We derive here a related result by showing that even a qubit is not sufficient to exhibit contextual correlations. For this we use a rather broad notion of contextuality. Consider a box with inputs from an alphabet ${\mathcal{X}}$ and outputs from an alphabet ${\mathcal{A}}$ as before. The input sequences are restricted such that a sequence $\vec{x}$ is admissible if and only if all inputs are from the same context ${\mathcal{C}}\subset{\mathcal{X}}$ , i.e., $\set{x_{i}}{i}\subset{\mathcal{C}}$ . A context ${\mathcal{C}}$ is a set of inputs, such that $p(\vec{a}|\vec{x})=p[\pi(\vec{a})|\pi(\vec{x})]$ for any inputs sequence $\vec{x}$ from ${\mathcal{C}}$ , any output sequence $\vec{a}$ , and any permutation $\pi$ . In addition we assume that any input is repeatable, i.e., $p(\vec{a}b|\vec{x}x_{i})=p(\vec{a}|\vec{x})\delta_{b,a_{i}}$ for any position $i$ in any admissible sequence.

Such a box is noncontextual, if all correlations of the box (using only admissible input sequences) can be reproduced by a box without memory, i.e., by a noncontextual model. We claim that any such box implemented on a qubit is noncontextual.

We start the proof of this statement by determining those inputs, which cannot require the use of memory. First, if an input $z$ ever produces only the output $c$ , within all admissible input sequences, then we can eliminate this input from our considerations. This is the case, because in any sequence we can permute $z$ to the end of the sequence. Then

[TABLE]

where the first equality is due to Eq. (3) and the second due to the assumption that only the output $c$ ever occurs. Second, assume that for a certain input $z$ , whenever it occurs in an admissible sequence, the internal state of the machine before the input $z$ is only ever the state $\rho$ . Again we can eliminate this input from our considerations, because the output for $z$ and the state after the output can be determined without considering the state. Third, we can ignore the pathological cases of inputs, which are not member of any context. In the following we assume without loss of generality, that the box does not have any input falling under the those three cases just discussed.

Next, we show that for any input $z$ the instrument $({\mathcal{I}}_{c|z})_{c}$ must be a measure-and-prepare instrument of the form

[TABLE]

This can be seen as follows. According to the assumptions, there are two input sequences $\vec{x}z$ and $\vec{y}z$ and corresponding output sequences $\vec{a}c$ and $\vec{b}c$ , so that the state before the input $z$ is $\rho$ and $\rho^{\prime}$ , respectively, with $\rho\neq\rho^{\prime}$ . Using Eq. (3) and Eq. (13) we have

[TABLE]

where $p(\vec{a}|\vec{x})>0$ and $p(\vec{b}|\vec{y})>0$ . Therefore for $c\neq c^{\prime}$ ,

[TABLE]

with $\bar{\rho}=(\rho+\rho^{\prime})/2$ . Since $\rho\neq\rho^{\prime}$ and we assume a qubit system, the mixture $\bar{\rho}$ has necessarily rank two, i.e., $\bar{\rho}\geq\epsilon\openone$ for some $\epsilon>0$ . We arrive at the condition

[TABLE]

where $K_{i}$ and $Q_{j}$ are the Kraus operators associated, respectively, with the instruments ${\mathcal{I}}_{c^{\prime}|z}$ and ${\mathcal{I}}_{c|z}$ , e.g., ${\mathcal{I}}_{c^{\prime}|z}X=\sum_{j}K_{j}^{\dagger}XK_{j}$ . Then $K_{i}Q_{j}=0$ for all $i,j$ . Similarly, exchanging $c$ with $c^{\prime}$ , we obtain $Q_{j}K_{i}=0$ for all $i,j$ . This implies that $K_{i}$ and $Q_{j}$ are of rank one and that $K_{i}$ is proportional to $K_{i^{\prime}}$ as well as $Q_{j}$ being proportional to $Q_{j^{\prime}}$ , for all $i,i^{\prime}$ and $j,j^{\prime}$ . Hence we can omit the indices $i,j$ and consider simply $K$ and $Q$ . Note that from $\sum_{c}{\mathcal{I}}_{c|z}\openone=\openone$ , the condition $Q^{\dagger}Q\leq\openone$ follows which allows us to write $Q={\ket{\alpha}\!\bra{\beta}}$ with $\braket{\alpha}{\alpha}=1$ and $\braket{\beta}{\beta}\leq 1$ . Now, for $c=c^{\prime}$ we obtain

[TABLE]

which implies $(Q^{\dagger})^{2}Q^{2}=Q^{\dagger}Q$ . It follows that either $\ket{\beta}=0$ or $\ket{\alpha}$ and $\ket{\beta}$ are equal up to a phase and hence ${\mathcal{I}}_{c|z}$ is as stated in Eq. (34).

As final step we need to show that there is no contextuality for projective qubit instruments. Given an admissible input sequence $\vec{x}yz$ , and an output sequence $\vec{a}bc$ such that $p(\vec{a}b|\vec{x}y)>0$ , we have

[TABLE]

The left hand side of both expressions has to be equal, yielding ${\lvert{\braket{\psi_{b,y}}{\psi_{c,z}}}\rvert}\in\set{0,1}$ .

Consequently, any two inputs within a context are realized by the same projective instrument, except for some relabeling of the outcomes. We choose a specific measurement within one context, say $y$ , so that ${\mathcal{I}}_{a|x}=\sum_{b}{\mathcal{I}}_{b|y}f^{b}(a|x)$ with some coefficients $f^{b}(a|x)\in\set{0,1}$ . This way we can write for any correlations of this context

[TABLE]

which is exactly the formula for a one-state machine, i.e., a noncontextual model.

This concludes the proof of our statement, due to the following observation. If two contexts share an observable, then our argument already applies and the union of both contexts must admit a noncontextual model and hence the union of both contexts is again a context. Eventually, we can join contexts until all contexts are mutually disjoint. For each disjoint set we can construct a noncontextual model, and since there are no admissible sequence involving two different contexts, we have constructed a noncontextual model for all admissible input sequences.

V Conclusions and outlook

We introduced the notion memory cost of simulating temporal correlations based on the notion of finite-state machine, i.e., a physical system accepting an input at each time instant and generating an outcome and an internal state transition according to probabilistic rules. We investigated the correlations obtainable via such finite-state machines operating according to different probability theories, i.e., classical, quantum, or GPT. Our framework allow us to derive inequalities able to discriminate among different theories for the simplest nontrivial case, i.e., two-state machines, two inputs, two outputs, and sequences of length two. Moreover, we investigated, from the perspective of quantum finite-state machines, the possibility of simulating contextual correlations with a qubit and answered this question in the negative.

Our framework provides a notion of nonclassicality for single systems, which is based solely on observed correlations and does not make any assumption of the type of measurements involved, e.g., compatibility or noninvasiveness. We believe that several problems in quantum foundations and quantum information could be studied in this framework. For instance, a notion of nonclassicality for single systems, i.e., quantum contextuality, has recently been suggested as a resource for quantum computation. On the other hand, memory has been identified as a resource needed to simulate contextual correlations classically Kleinmann et al. (2011); Fagundes and Kleinmann (2017). In addition, a different notion of contextuality for sequential operations has been defined and connected to speed-up in quantum computation Mansfield and Kashefi (2018). Our work could provide a general framework to discuss such different results and understand better the connection between memory cost of (classical) simulations, contextual correlations, and advantages in computation. Moreover, the idea of computation in GPTs, such as Spekkens’ toy model Spekkens (2007), that are intermediate between classical and quantum probability has been recently investigated Johansson and Larsson (2017a, b). In particular, this GPT can be exactly simulated with two classical bits.

Acknowledgements.

The authors would like to thank Rafael Chaves, Andrew Garner, Otfried Gühne, Jannik Hoffmann, Niklas Johansson, Jan-Åke Larsson, Nikolai Miklin, Miguel Navascués, Cornelia Spee, Giuseppe Vitagliano, and Mischa Woods for discussions. This work has been supported by the Austrian Science Fund (FWF): M 2107 (Meitner-Programm) and ZK 3 (Zukunftskolleg), FQXi Large Grant “The Observer Observed: A Bayesian Route to the Reconstruction of Quantum Theory”, FIS2015-67161-P (MINECO/FEDER), Basque Government (project IT986-16), and ERC (Consolidator Grant 683107/TempoQ).

Appendix A Brief introduction to GPTs

In quantum theory the set of effects is represented by Hermitian operators $F$ with $0\leq F\leq\openone$ . This convex set has three characteristic properties. (i) It is a subset of the real vector space of Hermitian operators. (ii) There exists the special operator $\openone$ representing the all-embracing effect. (iii) Its shape is given by the partial order $A\leq B$ which is defined by the condition that $B-A$ is positive semidefinite.

In a GPT, the notion of an effect is generalized by considering a straightforward generalization of those properties. We start with an arbitrary real vector space $V$ with a partial order $a\leq b$ . This partial order has to be linear in the sense that $a\leq b$ implies $\lambda a\leq\lambda b$ for any $\lambda\in{\mathds{R}}^{+}$ and $a\leq b$ implies $a+c\leq b+d$ if also $c\leq d$ . This turns $(V,\leq)$ into an ordered vector space.

The all-embracing effect is a distinct element $e\in V$ . It is is required to dominate all of $V$ , i.e., for any $x\in V$ there is a positive number $\lambda$ such that $x\leq\lambda e$ . This property makes $e$ an order unit and $(V,\leq,e)$ an order unit vector space. In addition, it is convenient to assume that the order unit is Archimedean, i.e., if $x\leq\lambda e$ holds for all $\lambda>0$ , then already $x\leq 0$ . In our paper we implicitly assume that any order unit is Archimedean.

It is sometimes convenient to let $V^{+}=\set{x\in V}{0\leq x}$ . Since $a\leq b$ is equivalent to $b-a\in V^{+}$ , we then equivalently describe an AOU space by the tuple $(V,V^{+},e)$ . The effects in a GPT are now given by the set $V_{e}^{+}=V^{+}\cap(e-V^{+})$ . A measurement ${\mathsf{M}}$ in a GPT is represented by a collection of elements ${\mathsf{M}}=(f_{k})_{k}\subset V_{e}^{+}$ with $\sum f_{k}=e$ , where $f_{k}$ represent the outcomes of the measurement.

For the set of states, we note that in quantum theory one can represent a state $\rho$ equivalently by the linear map $\omega\colon X\mapsto\operatorname{tr}(\rho X)$ . Then the normalization of $\rho$ becomes $\omega(\openone)=1$ and the condition $\rho\geq 0$ reads $\omega(X)\geq 0$ for all $X\geq 0$ . By analogy, the set of states in a GPT is given by

[TABLE]

where $V^{*}=\set{\varphi\colon V\rightarrow{\mathds{R}}}{\varphi\text{ is linear}}$ is the dual space of $V$ . With this definition, the probability for outcome $k$ of a measurement ${\mathsf{M}}=(f_{k})_{k}$ is given by $p_{k}=\omega(f_{k})$ .

Appendix B Bound on ${S}$ for hbits

The proof is by contradiction. Let us assume $\Omega_{\mathrm{hbit}}=3$ , we then have $p(01|00)=p(10|10)=p(10|11)=1$ , and $p(0|0)=p(1|1)=1$ . From $p(0|0)=1$ , we have $\omega(f_{0|0})=t_{0|0}+{\boldsymbol{w}}^{\dagger}{\boldsymbol{f}}_{0|0}=1$ , where $f_{0|0}=(t_{0|0},{\boldsymbol{f}}_{0|0})$ . On the other hand, by the definition of effects and state, we have ${\lvert{{\boldsymbol{f}}_{0|0}}\rvert}\leq\min(t_{0|0},1-t_{0|0})$ and ${\lvert{{\boldsymbol{w}}}\rvert}_{*}\leq 1$ . We then have

[TABLE]

From $f_{0|0}\neq e$ (because $p(00|00)=0$ ), we have ${\boldsymbol{f}}_{0|0}\neq{\boldsymbol{0}}$ and hence $t_{0|0}<1$ . Then, using again ${\lvert{{\boldsymbol{w}}}\rvert}_{*}\leq 1$ and ${\lvert{\cdot}\rvert}={\lvert{\cdot}\rvert}_{*}$ , together with Cauchy–Schwarz inequality, we have

[TABLE]

Similarly, we obtain

[TABLE]

and, again, $t_{1|1}<1$ .

We need now to characterize the terms of the form $\omega({\mathcal{I}}_{a|x}f_{b|y})$ , corresponding to sequences of length two. We use the constraints that arise from the condition that the transformation must map effects to effects. Then, we use that ${\mathcal{I}}_{a|x}$ is a linear transformation that maps the identity element to $f_{a|x}$ , i.e., ${\mathcal{I}}_{a|x}e=f_{a|x}$ . We, thus, have

[TABLE]

where ${\boldsymbol{\alpha}}_{a|x}$ is a $n$ -dimensional vector and $B_{a|x}$ a $n\times n$ matrix. The expectation value can then be written as

[TABLE]

We can see the transformation ${\mathcal{I}}_{a|x}$ , applied to the left, as a state transformation, i.e., Schrödinger picture and with normalization corresponding to outcome probability. Then, we have that ${{\lvert{B^{\dagger}_{a|x}{\boldsymbol{w}}+{\boldsymbol{\alpha}}_{a|x}}\rvert}_{*}\leq t_{a|x}+{\boldsymbol{w}}\cdot{\boldsymbol{f}}_{a|x}}$ . Notice that such a condition also guarantees that $p(ab|xy)\geq 0$ and ${p(ab|xy)\leq p(a|x)}$ .

This translates to ${{\lvert{B^{\dagger}_{a|x}{\boldsymbol{w}}+{\boldsymbol{\alpha}}_{a|x}}\rvert}_{*}\leq 1}$ for the case $(a,x)=(0,0)$ or $(1,1)$ . In fact, in those cases we have $t_{a|x}+{\boldsymbol{w}}\cdot{\boldsymbol{f}}_{a|x}=1$ , so the dual norm condition guarantee that $\omega({\mathcal{I}}_{a|x}f_{b|y})\leq 1$ for all $f_{b|y}$ .

From the conditions $p(11|11)=0$ , we obtain

[TABLE]

which implies together with Eq. (45), ${{\lvert{B^{\dagger}_{1|1}{\boldsymbol{w}}+{\boldsymbol{\alpha}}_{1|1}}\rvert}_{*}\leq 1}$ , and Cauchy–Schwarz inequality, that

[TABLE]

On the other hand, we have $p(10|10)=1$ that, by Eq. (44) and Eq. (52), implies

[TABLE]

which implies $t_{0|0}=1$ , i.e., a contradiction with $t_{0|0}<1$ , which concludes the proof.

Appendix C Capacity of dichotomic norm cones

In the following we prove that dichotomic norm cones describe systems of capacity two. For convenience, we repeat Eq. (7) from the main text.

[TABLE]

We first show that a capacity of two is an upper bound.

Lemma 1.

In a dichotomic norm cone, let $(\omega_{k})_{k}$ be a collection of $d$ states and $(f_{k})_{k}$ a collection of $d$ effects, such that Eq. (7) is satisfied. Then $d\leq 2$ .

Proof.

Any effect $f=(t,{\boldsymbol{x}})$ must satisfy $0\leq f\leq e$ , i.e., $t\geq{\lvert{{\boldsymbol{x}}}\rvert}$ and $1-t\geq{\lvert{{\boldsymbol{x}}}\rvert}$ . Furthermore, a state $\omega\colon(s,{\boldsymbol{y}})\mapsto s+{\boldsymbol{w}}^{\dagger}{\boldsymbol{y}}$ must obey ${\lvert{{\boldsymbol{w}}}\rvert}_{*}\leq 1$ . It follows that ${\boldsymbol{w}}^{\dagger}{\boldsymbol{x}}\leq{\lvert{{\boldsymbol{x}}}\rvert}$ and hence $\omega f=1$ requires $t\geq\frac{1}{2}$ . Thus $\sum_{k}f_{k}\leq e$ implies for $f_{k}=(t_{k},{\boldsymbol{x}}_{k})$ the inequalities

[TABLE]

Which yields at once the assertion. ∎

In addition, if the dimension of the underlying vector space is finite, we can always find vectors ${\boldsymbol{x}}$ and ${\boldsymbol{w}}$ , such that ${\lvert{{\boldsymbol{x}}}\rvert}=1$ , ${\lvert{{\boldsymbol{w}}}\rvert}_{*}=1$ , and ${\boldsymbol{w}}^{\dagger}{\boldsymbol{x}}=1$ . Hence, the states $\omega_{1,2}=(1,\pm{\boldsymbol{w}})$ and effects $f_{1,2}=(1,\pm{\boldsymbol{x}})/2$ obey Eq. (7). It follows that the capacity of a dichotomic norm cone is always exactly two.

Bibliography65

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Horodecki et al. (2009) Ryszard Horodecki, Paweł Horodecki, Michał Horodecki, and Karol Horodecki, “Quantum entanglement,” Rev. Mod. Phys. 81 , 865–942 (2009) . · doi ↗
2Gühne and Tóth (2009) Otfried Gühne and Géza Tóth, “Entanglement detection,” Physics Reports 474 , 1 – 75 (2009) . · doi ↗
3Bell (1964) John Stewart Bell, “On the Einstein-Podolsky-Rosen paradox,” Physics 1 , 195–200 (1964) .
4Brunner et al. (2014) Nicolas Brunner, Daniel Cavalcanti, Stefano Pironio, Valerio Scarani, and Stephanie Wehner, “Bell nonlocality,” Rev. Mod. Phys. 86 , 419–478 (2014) . · doi ↗
5Kochen and Specker (1967) Simon Kochen and Ernst P. Specker, “The problem of hidden variables in quantum mechanics,” J. Math. Mech. 17 , 59 (1967) .
6Leggett and Garg (1985) Anthony J. Leggett and Anupam Garg, “Quantum mechanics versus macroscopic realism: Is the flux there when nobody looks?” Phys. Rev. Lett. 54 , 857–860 (1985) . · doi ↗
7Emary et al. (2014) Clive Emary, Neill Lambert, and Franco Nori, “Leggett–Garg inequalities,” Reports on Progress in Physics 77 , 016001 (2014) .
8Gühne et al. (2010) Otfried Gühne, Matthias Kleinmann, Adán Cabello, Jan-Åke Larsson, Gerhard Kirchmair, Florian Zähringer, Rene Gerritsma, and Christian F. Roos, “Compatibility and noncontextuality for sequential measurements,” Phys. Rev. A 81 , 022121 (2010) . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Memory cost of temporal correlations

Abstract

I Introduction

II Temporal correlations

III Finite-state machines

III.1 Classical finite-state machines

III.1.1 Classical finite-state machines and Leggett-Garg’s macrorealist models

III.2 Quantum finite-state machines

III.3 GPT two-state machines

IV Bounds on temporal correlations

IV.1 Measure-and-prepare strategies

IV.2 Analytical and numerical bounds

IV.2.1 Classical bit

IV.2.2 Quantum bit

IV.2.3 Hyperbit

IV.2.4 Generalized bit

IV.3 Impossibility of simulating contextual correlations with general

V Conclusions and outlook

Acknowledgements.

Appendix A Brief introduction to GPTs

Appendix B Bound on S{S}S for hbits

Appendix C Capacity of dichotomic norm cones

Lemma 1**.**

Proof.

Appendix B Bound on ${S}$ for hbits

Lemma 1.