Gentle Measurement of Quantum States and Differential Privacy
Scott Aaronson, Guy N. Rothblum

TL;DR
This paper establishes a fundamental link between gentle quantum measurements and differential privacy, enabling improved protocols for shadow tomography and other quantum information tasks with enhanced efficiency and safety.
Contribution
It proves a general connection between gentle measurement and differential privacy, leading to new quantum protocols and bounds for shadow tomography.
Findings
Any alpha-gentle measurement is O(alpha)-DP on product states.
Any epsilon-DP product measurement is O(epsilon*sqrt(n))-gentle.
New efficient protocol for shadow tomography with fewer copies of quantum states.
Abstract
In differential privacy (DP), we want to query a database about n users, in a way that "leaks at most eps about any individual user," even conditioned on any outcome of the query. Meanwhile, in gentle measurement, we want to measure n quantum states, in a way that "damages the states by at most alpha," even conditioned on any outcome of the measurement. In both cases, we can achieve the goal by techniques like deliberately adding noise to the outcome before returning it. This paper proves a new and general connection between the two subjects. Specifically, we show that on products of n quantum states, any measurement that is alpha-gentle for small alpha is also O(alpha)-DP, and any product measurement that is eps-DP is also O(eps*sqrt(n))-gentle. Illustrating the power of this connection, we apply it to the recently studied problem of shadow tomography. Given an unknown d-dimensionalâŠ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Random Matrices and Applications
Gentle Measurement of Quantum States and Differential Privacy
Scott Aaronson University of Texas at Austin.  Email: [email protected].  Supported by a Vannevar Bush Fellowship from the US Department of Defense, a Simons Investigator Award, and the Simons âIt from Qubitâ collaboration. ââ
Guy N. Rothblum Weizmann Institute of Science. Â Email: [email protected]. Â Supported by ISF grant no. 5219/17.
Abstract
In differential privacy (DP), we want to query a database about users, in a way that âleaks at most about any individual user,â even conditioned on any outcome of the query.  Meanwhile, in gentle measurement, we want to measure quantum states, in a way that âdamages the states by at most ,â even conditioned on any outcome of the measurement.  In both cases, we can achieve the goal by techniques like deliberately adding noise to the outcome before returning it.  This paper proves a new and general connection between the two subjects.  Specifically, we show that on products of quantum states, any measurement that is -gentle for small  is also -DP, and any product measurement that is -DP is also -gentle.
Illustrating the power of this connection, we apply it to the recently studied problem of shadow tomography.  Given an unknown -dimensional quantum state , as well as known two-outcome measurements , shadow tomography asks us to estimate , for every , by measuring few copies of .  Using our connection theorem, together with a quantum analog of the so-called private multiplicative weights algorithm of Hardt and Rothblum, we give a protocol to solve this problem using  copies of , compared to Aaronsonâs previous bound of .  Our protocol has the advantages of being online (that is, the âs are processed one at a time), gentle, and conceptually simple.
Other applications of our connection include new lower bounds for shadow tomography from lower bounds on DP, and a result on the safe use of estimation algorithms as subroutines inside larger quantum algorithms.
Contents
-
2.3 Mixed States, Superoperators, Quantum Operations, and POVMs
-
10 Appendix: DP, Gentleness, and Triviality on Separable versus Entangled States
-
12 Appendix: Differential Privacy Beyond Product and LOCC Measurements
1 Introduction
This paper is about a new mathematical connection between two conceptsâgentle measurement in quantum mechanics, and differential privacy in classical computer scienceâand the applications of this connection to the design of new quantum measurement procedures and algorithms. Â Since the paper is meant to be accessible to researchers in both fields (and beyond), we begin by saying a few words about each of the concepts separately.
1.1 Gentle Measurement
In quantum mechanics, measurement is, famously, an inherently destructive process.  For example, if we measure a qubit in the  basis, we âforce the qubit to decideâ whether to be  (with probability ) or  (with probability ).  The qubitâs state then âcollapsesâ to whichever choice it made.  Thereâs no way to measure again, unless of course we happen to have (or know how to prepare) a second qubit in the same state.111This destructiveness is not unique to quantum mechanics: it has a close analogue in classical Bayesian conditioning, where a probability distribution can âcollapseâ to a single point when we make an observation.  But in classical probability, the âcollapseâ is purely internal and mental, in the sense that we could undo it by simply forgetting the observation.  In quantum mechanics, by contrast, collapse causes an objective change to the measured system, one that could also be detected by someone else who later measured the system.
Even in quantum mechanics, though, not all measurements on all states are destructive.  For example, if a qubit happens to be in the  state already, then measuring in the  basis causes no damage at all.  And if the qubit is the state for small , then measuring in the  basis causes only minimal damage, since the result is almost always that the qubit âsnapsâ to .  More generally, the principle is this:
A measurement applied to a state  necessarily severely damages if, and only if, the outcome of is highly unpredictable even to someone who already knows .
This principle, which can be quantified in various ways, is called information/disturbance tradeoff: if creates lots of new (random) information, then it must also cause lots of disturbance to , and vice versa.
A corollary is that, if someone who knew could usually predict the measurement outcome in advance, then applying  need not damage by much.  Note that this corollary does not describe only trivial or uninteresting measurements, since in general the measurer does not know in advanceâthatâs why sheâs measuring it!222And also, even if she did know a description of , she might still find predicting the outcome of a measurement on  to be computationally intractable.
Indeed, so-called gentle measurements, which can be limited in how much damage they cause, have found numerous applications in experimental physics, the foundations of quantum mechanics, and quantum computing theory.333Physicists more often refer to âweak measurement,â a related but not identical concept, which typically means that the measurement returns very little information about the state (in this paper, weâll call such measurements â-trivialâ).  All weak measurements can be implemented gently, and weâll show in Lemma 23 that the only measurements that are gentle on all states are weak.  But measurements that are gentle on large sets of interesting states (such as product states) can be far from weak, a point that will be crucial for us.  Experimentalists, for example, know how to perform a measurement on a large number of identically prepared particles, in a way that reveals the particlesâ quantum state to high accuracy while causing very little damage.444With a single particle, this is of course impossible.  More theoretically, gentle measurement has also played a central role in proposals for publicly-verifiable quantum money that can be verified many times, quantum software that can be evaluated on many inputs, and so forth (see [4, 6]).  Gentle measurement is also needed in work on the nonabelian hidden subgroup problem [24], and on quantum advice complexity classes like (see [2, 3]).
Letâs now define a bit more formally what weâll mean, in this paper, by a quantum measurement being âgentle.â
Definition 1** (Gentle Measurements)**
Given a set of quantum mixed states in some Hilbert space, an implementation of a measurement ,555In this paper, by an âimplementationâ of a measurement , we mean a specification from which, given a state , one can calculate not only the probabilities of the various outcomes , but also the post-measurement states . and a parameter , we define to be -gentle on  if for all states , and all possible outcomes  of applying to , we have
[TABLE]
Here  represents trace distance, the standard distance metric on quantum states, while  represents the new, âcollapsedâ state assuming that the measurement outcome was .  (For a review of these and other quantum information concepts, see Sections 2.2 and 2.5.)
More generally, we say is -gentle on if for all states , inequality (1) holds with probability at least  over the possible outcomes  of applying  to .  We recover -gentleness by setting .
The most common choices for will be the set of product states , and the set of all states.
If a measurement is specified by its output probabilities only (technically, as a âPOVMâ), then we say that is -gentle if and only if there exists an -gentle implementation of it.
As an example, suppose we have qubits in a pure product state:
[TABLE]
Then consider the measurement that simply returns the total Hamming weight. Â This measurement is not -gentle for any nontrivial . Â So for example, if we apply to the equal superposition , weâll collapse the superposition over possible Hamming weightsâfrom a Gaussian wavepacket (as the physicists might call it)Â of width centered at , all the way down to a single random Hamming weight.
By contrast, now consider a measurement  that returns the Hamming weight, plus a random noise term  of average magnitude .  As an example, we could take this noise to follow a Laplace distribution:
[TABLE]
where
[TABLE]
for large .  We can implement the measurement  as follows.  Given , which we now think of as a superposition over -bit strings, first prepare alongside the state
[TABLE]
(In practice, we would of course impose a cutoff on .) Â Next, perform the unitary transformation
[TABLE]
Finally, measure the  register in the standard basis and output the result.
It turns out that this noisy measurement is -gentle.666While there are other ways to prove that is -gentle, the nicest proof we know will deduce it as an immediate corollary of this paperâs main results.  Intuitively, this is because the various Hamming weights that are well-represented in the âGaussian wavepacketâ âe.g., in the example , those Hamming weights such that âlead to probability distributions over measurement outcomes that mostly overlap.  In other words, when we observe an outcome of the form , the intrinsic variation in  within the superposition is dominated by the variation in .
1.2 Differential Privacy
Differential privacy (DP) is a young subfield of computer scienceâyounger than quantum computing, actuallyâthatâs seen tremendous growth since its beginnings around 2006 [20, 21, 44]. Â Though as weâll see, DPâs concepts turn out to have much broader applicability, the original motivating problem is as follows. Â Suppose that a hospital (or bank, or social media site) has a database of sensitive personal records. Â The hospital wants to let medical researchers query its database in such a way that
- (1)
the researchers can learn as much accurate statistical information as possible about the patient population (e.g., how many of them have colon cancer), but 2. (2)
each patient has a mathematical guarantee that, by opting to participate in the database, sheâs exposing to the researchers âonly a negligible amountâ of data about herself that would otherwise be private.
The question is, how should we design the queries to balance these two apparently conflicting demands?
More formally, call two databases  neighbors if they differ only in a single entry .  Then here is the key definition:
Definition 2** (Differential Privacy [20])**
Given a randomized algorithm that queries a database , as well as a parameter , we define to be -DP if for all databases  that are neighbors, and all possible outputs of , we have
[TABLE]
Here the probabilities are over the internal randomness used by .
In place of , one could also use the more intuitive .  However, the choice of  has the advantages that it composes nicely and is symmetric under inversions.
As an exampleâwhich should look familiar!âsuppose the databases are -bit strings, and consider the algorithm that simply returns the Hamming weight .  This algorithm is not -DP for any , since flipping just a single bit of can change the probability of an output (namely, the new Hamming weight) from [math] to .  By contrast, now consider the algorithm  that returns the Hamming weight , plus a Laplace noise term  thatâs distributed according to equation (2).  For any two neighboring databases , and any possible output , we have
[TABLE]
So we see that  is -DP.  Yet, as long as  is not too enormous, the output  still provides a useful estimate of .
Requiring multiplicative closeness in the probabilities of every output might seem overly strong.  But if we relaxed the definition to an additive one, weâd need to admit the algorithm that simply chooses a user  uniformly at random and publishes all of her data.  This algorithm is manifestly not âprivate,â and yet it satisfies a strong additive guarantee: if user changes her data, that will affect the probability distribution over outputs by at most  in variation distance.  On the other hand, one can check that this algorithm is not -DP for any finite .
DP has been applied in deployed systems, for example at Apple and Google; see for example [42] for discussion. Â The concept has also found application to other problems, not obviously related to privacyâfor example, adaptive data analysis (for more see Section 1.7). Â But what does DP have to do with quantum information in general, or gentle measurement in particular?
1.3 The Connection
Given two quantum mixed states  on registers each, call them neighbors if itâs possible to reach either  from , or  from , by performing a general quantum operation (a so-called superoperator) on a single register only.  In the special case where  and are both product states, this reduces to saying:  and  are neighbors if and only if for at most one .
Using this notion, we can easily generalize the definition of differential privacy from Section 1.2Â to the quantum setting:
Definition 3** (Quantum Differential Privacy)**
Given a set  of quantum mixed states each on registers, a measurement , and a parameter , we define to be -DP on  if for all states  that are neighbors, and all possible outputs of , we have
[TABLE]
Here the probabilities are over the intrinsic randomness of the measurement outcome.
More generally, we say is -DP on if for all neighboring states , inequality (4) holds with probability at least  over the possible outcomes  of applying  to .  We recover -DP by setting .777This is a slightly nonstandard definition of -DP, but can be related to the standard definition by a nontrivial result.  See e.g. Vadhan [44, Lemma 1.5].
The most common choices for will be the set of product states , and the set of all states.
Note that unlike with gentleness, the property of being -DP depends only on the output probabilities, and not at all on the post-measurement states (i.e., on the âimplementationâ of the measurement).
Perhaps the first question we should ask is: are there any nontrivial quantum measurements that satisfy the above definition? Â Indeed there are.
Recall the DP algorithm from Section 1.2, which returns the Hamming weight  of an -bit input database , plus Laplace noise  of average magnitude .  We can promote  to a quantum measurement on -qubit states, by implementing it using the procedure described in Section 1.1.  We then have the following:
Proposition 4
* is -DP on the set of all -qubit states.*
Proof. Since  only involves measuring the Hamming weight in the computational basis, for any -qubit state  we can write
[TABLE]
Also, if we act on a single register of , and then measure in the computational basis (which by the above, we can do without loss of generality), we map each database to a distribution over neighbors  of .  The proposition now follows from convexity together with equation (3).
Stepping back, weâve seen that simply measuring the Hamming weight of an -qubit state is neither gentle nor private.  And yet the same fixânamely, adding random noise to the measurement outcome before returning itâmakes the measurement both gentle and private.  Is this convergence, between gentle quantum measurement and differential privacy, just a coincidence?
Our main result asserts that itâs not a coincidence: thereâs a strong two-way connection between the two notions.
Theorem 5** (Main Result)**
For all quantum measurements on registers:
- (1)
If is -gentle on product states for , then is -DP on product states.888Indeed, it suffices for  to be bounded below  by any fixed constant (which then affects the multiplier in the ).  Similar remarks apply wherever constants like appear in this paper. 2. (2)
If is -DP on product states, and is a product measurement,999That is, if can be implemented by first applying a classical algorithm to the outcomes of separate POVM measurements on the registers, and then uncomputing the outcomes of those measurements. then is -gentle on product states.101010On non-product states, will still produce the correct output probabilities, but it need not be gentle.
Again, here a âmeasurementâ corresponds to a specification of output probabilities; for to be -gentle means that there exists an -gentle implementation of .
Intuitively, itâs far from obvious that gentleness and differential privacy should be connected in this way.  After all, the definition of -gentleness makes no reference to the notion of âneighboringâ states.  Conversely, the definition of -DP is exclusively concerned with output probabilities, and makes no reference to post-measurement states.  Our goal is to explain why gentleness and DP are connected in this way, and to explore the consequences of the connection.
Weâll see some applications of Theorem 5Â shortly, in Sections 1.4 and 1.5. Â Before we do so, however, letâs make a few comments about the theorem statement.
At first glance, part (2) of the theorem seems weaker than part (1)âespecially because of the  blowup in parametersâbut itâs the part that carries many of the interesting implications.  In Section 5, weâll show that the  blowup is unavoidable.  Indeed the measurement , with , already demonstrates this.
By contrast, the condition that is a product measurement is not clearly necessary; one of the central open problems we leave is whether that condition can be removed.  In Appendix 12, weâll give examples of quantum DP measurements that canât be approximated by any product or (we conjecture) even LOCC measurements.  However, all the examples we currently know of such measurements are extremely artificial.
The restriction to product states might seem strange, but itâs provably unavoidable if we want Theorem 5 to say anything about nontrivial measurements.  As weâll show in Section 3, there is a counterpart of Theorem 5 for states that could have arbitrary correlation or entanglement among the registers.  It turns out, however, that if a measurement is -gentle on all states for , then is close to trivial (i.e., it barely depends on the input state at all).  And conversely, if is -DP, then the best we can deduce is that is -gentle on all states, rather than -gentle.  While that might sound like a merely quantitative gap, the trouble is, again, that the only measurements that are -DP for  are close to trivial.  By contrast, plenty of interesting measurements are -DP for .
One might wonder whether our reductions between privacy and gentleness preserve computational efficiency.  In one directionâturning gentleness into privacyâthe answer is clearly yes, since an -gentle measurement is -DP; nothing further needs to be done.  However, in the other directionâturning privacy into gentlenessâwe can implement a gentle measurement efficiently only if we have an efficient algorithm to âQSampleâ âs output distribution on a given input.  QSampling is a term coined in 2003 by Aharonov and Ta-Shma [9], which just means that we can efficiently prepare a superposition over outputs of the form
[TABLE]
which is not entangled with any âgarbageâ dependent on .  In practice, most DP algorithms that we know about do give rise to efficient QSampling procedures, but this property doesnât follow automatically from a DP algorithmâs being efficient.  In Section 7, weâll explore the issue of computational efficiency further, and give nontrivial conditions under which efficiency is preserved.
1.4 Applications
Can we exploit the connection between gentle measurement and differential privacy to port results from one field to the other, as was done with the connections between communication complexity and circuit lower bounds, cryptography and learning, etc.?  The second main contribution of this paper is to use Theorem 5, together with previous work in DP, to derive new results in quantum measurement theory and quantum algorithms.111111Some of these applications could also have been obtained by âbrute forceâ (e.g., directly designing and analyzing the desired gentle measurements), but the connection to DP will both guide us to the correct statements, and enable the simplest proofs of them that we know. Meanwhile, our applications to so-called shadow tomography of quantum states, described in Section 1.5, will make essential use of sophisticated algorithms and lower bounds from the DP literature.
As a tiny warmup application, notice that , the Laplace noise measurement from Section 1.2, is a âproduct measurement,â in the sense that it can be implemented via an algorithm that measures each register separately.  And thus, by combining part (2) of Theorem 5 with Proposition 4, we immediately obtain the following:
Corollary 6** (Gentleness of Laplace Noise Measurement)**
* is  -gentle on product states.*
As far as we know, proving Corollary 6 directly would require a laborious calculation.
Here is another application.  In the early days of quantum computing, Bennett et al. [11] observed that a quantum algorithm can safely invoke other quantum algorithms for decision problems as subroutines inside of a superpositionâor in terms of complexity classes, that .  The proof uses amplification, to push down the subroutineâs error probability, combined with uncomputing, to eliminate any âgarbageâ that the subroutine leaves entangled with its input.  However, this straightforward uncomputing strategy no longer works for subroutines whose purpose is to estimate an expectation value to within (say, the acceptance probability of a quantum circuit).
In Section 7, weâll point out one simple solution to this problem: namely, run the subroutine  times in parallel, then estimate the desired expectation values by simulating gentle measurements on the resulting states.  If we implement this idea using the Laplace noise measurement , then Corollary 6 yields the following:
Theorem 7
Without loss of generality, a  algorithm can at any point estimate  to within , on any superposition containing descriptions of quantum circuits , while maintaining the superpositionâs coherence.
While itâs possible to prove Theorem 7 âbare-handedly,â without knowing about the connection between gentleness and DP, the point is that the floodgates are now open.  Given a quantum algorithm , which is run as a subroutine inside a larger quantum algorithm , there are many things that might want to know about âs output behavior, beyond just additive estimates for specific probabilities.  Whatever the details, Theorem 5 reduces the task to designing a suitable efficient DP algorithm, or finding such an algorithm in the literature.  Gentleness then follows automatically.
1.5 Shadow Tomography
In Section 6, we present our âflagshipâ application for the connection between gentleness and DP: a new quantum measurement procedure, called Quantum Private Multiplicative Weights (QPMW), which achieves parameters and properties that werenât previously known.
QPMW addresses a task that Aaronson [5], in 2016, called shadow tomography.  Here weâre given  copies of an unknown -dimensional mixed state .  Weâre also given known two-outcome measurements .  Our goal is to learn  to within an additive error of , for every , with high success probability (say, at least ), by carefully measuring the âs.  Setting aside computational difficulty, how many copies of  are information-theoretically necessary for this?
At one extreme of parameters, and suppressing the dependence on , itâs clear that  copies of  suffice, since we could just apply each  to different copies.  At a different extreme, itâs also clear that  copies sufficeâor not âclear,â but it follows from celebrated recent work by OâDonnell and Wright [36] and (independently) Haah et al. [26], who showed that  copies of  are necessary and sufficient for full quantum state tomography: that is, reconstructing the entire state  to suitable precision.
But what if we only want to learn the âshadowâ that  casts on the measurements ?  Aaronson [5] raised the question of whether shadow tomography might be possible using a number of copies that scales only polylogarithmically in both and âso in particular, thatâs polynomial even if and  are both exponential.  While this seemed overly ambitious, Aaronson was unable to rule it out; and indeed, last year he showed:
Theorem 8** (Aaronson [6])**
There exists an explicit procedure to perform shadow tomography using
[TABLE]
copies of .  Here the  hides factors of , , and .
Shortly afterward, BrandĂŁo et al. [14] gave a different shadow tomography procedure, based on semidefinite programming, which achieved the same sample complexity as Aaronsonâs but was more efficient computationally.
However, these developments left several questions open:
- (1)
What is the true sample complexity of shadow tomography?  The best lower bound in [6] is that  copies are needed. 2. (2)
The procedures of [6, 14] destroy the copies of  in the process of measuring them.  Is there a shadow tomography procedure thatâs also gentle? 3. (3)
The procedures of [6, 14] require the full list  to be known in advance.  Is there a shadow tomography procedure thatâs onlineâi.e., that receives the measurements one by one, and estimates each  immediately after receiving ?
In Section 6, by exploiting our connection between gentleness and DP, and by quantizing a known classical DP algorithm called Private Multiplicative Weights [27], we prove a new shadow tomography theorem that addresses all of the above questions.
Theorem 9** (Quantum PMW)**
There exists an explicit procedure, Quantum Private Multiplicative Weights (QPMW), that performs shadow tomography with success probability  using
[TABLE]
copies of , and which is also online and -gentle.
Most notably, QPMW is both online and gentle; the previous procedures [6, 14] were neither. Â Because of its simplicity and its online nature, QPMW seems better suited than its predecessors to potential experimental realization.
Meanwhile, compared to Theorem 8, Theorem 9 improves the dependence on from  to .  The dependence on  and is worse, but we conjecture that this is an artifact of our analysis, and that porting so-called âadvanced compositionâ [22] to the quantum setting would ameliorate the situation.  The running time of QPMW is roughly , where  is the time needed to implement a single ; this improves on the  running time of Aaronsonâs procedure, and matches an improvement from to  in BrandĂŁo et al. [14].
Itâs hard to give a simple intuition for the improvement in -dependence from  to .  Loosely, though, gentleness (derived from DP) lets QPMW be online, and being online lets QPMW avoid the âgentle search procedure,â a key subroutine in Aaronsonâs earlier procedure [6] that was responsible for the  factor.  In any case, we wish to stress that quantitative improvements in sample complexity are not the main point here.  The point, rather, is that the connection between DP and gentleness leads to an entirely new approach to shadow tomography.
The DP/gentleness connection turns out to be useful not just for upper bounds on the sample complexity of shadow tomography, but also lower bounds.  In Section 6.3, weâll combine a recent lower bound on DP algorithms [16] with part (1) of Theorem 5 (i.e., the gentleness implies DP direction), to deduce a new lower bound on the sample complexity of gentle shadow tomography, where here âgentleâ means âgentle on all product state inputs.â  Weâll also use recent work from adaptive data analysis [35] to observe a lower bound on the sample complexity of online shadow tomographyâshowing that, for the latter task, QPMWâs sample complexity is optimal up to polynomial factors.
Finally, in Section 7.2, we prove lower bounds on the computational complexity of gentle and online shadow tomography, by deducing them as corollaries of recent lower cryptographic bounds for differential privacy and adaptive data analysis [43, 28, 40].  Assuming the existence of a one-way function that takes  time to invert, these lower bounds say that any algorithm for online or gentle shadow tomography needs  time, so in that respect the QPMW procedure is optimal for those tasks.
We stress that all our lower bounds for shadow tomographyâboth information-theoretic and computationalâare obtained by using this paperâs machinery to port known classical results to our setting.  Thus, all of the lower bounds apply equally well to the âclassical special caseâ of shadow tomography, where we are trying to learn properties of a probability distribution  given independent samples from , and none of them yet say anything specific to quantum mechanics.
1.6 Techniques
Relating Gentleness to DP.  In the proof of our main resultâi.e., the connection between gentleness and differential privacyâthe easy direction is that gentleness implies DP.  This direction produces only constant loss in parameters, and does not even have much to do with quantum mechanics.  We consider the contrapositive: if a measurement  is not DP, then there are two neighboring states, call them  and , as well as a measurement outcome , such that  and differ by a large multiplicative factor.  But in that case, we can study what happens if we apply to the equal mixture , and then condition on outcome .  Here we can use Bayesâ theorem to show that the post-measurement state will not be close to âintuitively, because it will have âmore  than â or vice versa.  Therefore is not gentle on product states (for if  and  are neighbors and are themselves product states, then  is a product state).
The harder direction is to show that -DP implies -gentleness (for product states, and at least for a restricted class of measurements).  We work up to this result in a sequence of steps.  The first is to prove a purely classical analogue: namely, any -DP classical algorithm is -gentle on product distributions, âand indeed, the posterior distribution , conditioned on some output , has KL-divergence at most  from .  While this step has echoes in earlier work on adaptive data analysis [19, 18, 38] (see Section 1.7), we provide our own proof for completeness.  Our proof uses the -DP property of , together with the fact that  is a product distribution, to show that, if we reveal a sample from  a single register at a time, from the to the , then the expected KL-divergence from  increases by at most  per register, and is therefore at most  overall.
The second step is to prove an analogous result if the classical algorithm is applied, not to a sample from the distribution , but in superposition to each component of the âQSamplingâ state
[TABLE]
To prove this, we let  be the post-measurement state conditioned on outcome , and then upper-bound the trace distance,
[TABLE]
in terms of the square root of the KL-divergence between and , which we previously showed was .  (To do that, in turn, we use the Hellinger distance between and  as an intermediate measure.)
The last step is to generalize from algorithms that act separately on each computational basis state to measurements that can apply a separate POVM to each register, and also from pure product states to mixed product states. Â We achieve these generalizations using standard manipulations in quantum information. Â We expect that further generalizations are possible with more work.
Shadow Tomography. Â The analysis of Quantum Private Multiplicative Weights* *(QPMW), our new online, gentle procedure for shadow tomography, is our technically most demanding result. Â The QPMW procedure itself is relatively simple,121212Indeed, QPMW is arguably simpler than previous shadow tomography procedures, especially because it completely avoids the use of the so-called Quantum OR Bound of Harrow, Lin, and Montanaro [31]. Â QPMW could, in fact, be used to give an independent proof of the OR Bound, one where the procedure would moreover be gentle (albeit, possibly with worse sample complexity). and is directly inspired by an analogous procedure from classical differential privacy, the so-called Private Multiplicative Weights (PMW) algorithm of Hardt and Rothblum [27] from 2010.
Given a database , of  records  drawn independently from some underlying probability distribution , the goal of PMW is to answer an enormous number of statistical queries about , possibly as many as  of them, in a way that preserves the overall differential privacy of .  Here the queries need to be answered one by one, as they arrive, and could be chosen by an adaptive adversary.
PMW achieves this by maintaining, at all times, a current hypothesis  about .  Whenever a new query arrives, the first thing PMW does is to check whether  and lead to approximately the same answer for that query.  If the answers are equal to within some threshold, then PMW simply answers the query using , without looking further at .  Only if  and  disagree substantially does PMW query  a second timeâboth to learn the correct answer to the current query, and to use that answer to update the hypothesis .  For both types of queries, PMW uses the standard DP trick of adding a small amount of Laplace noise to any statistics gathered from , before using those statistics for anything else.
Itâs clear, by construction, that this strange two-pronged approach will always return approximately correct answers, with high probability. Â But why does it help in preserving privacy? Â The privacy analysis depends on proving three facts:
- (1)
Each query leads to only a negligible loss in privacy (say, ), unless it has an appreciably large probability of triggering an update. 2. (2)
Even when an update is triggered, the loss in privacy is still modest, say . 3. (3)
The number of updates is always extremely small, say . Â This is true for âthe usual multiplicative weights reasons.â
Once one understands the connection between privacy and gentleness, itâs natural to wonder whether a quantum analogue of PMW might let one apply a huge sequence of measurements , one at a time, to a small collection of identical quantum states (where, say, ), in a way that yields accurate estimates of  for every , without destroying the states in the process or even damaging them too much.  This, of course, is precisely the problem of (gentle, online) shadow tomography.
In Section 6, we prove that indeed this is possible.  Our QPMW algorithm is just the âobviousâ quantum generalization of PMW.  That is, QPMW at all times maintains a current hypothesis, , about the unknown quantum state .  Initially  is the maximally mixed state .  Whenever a new measurement  arrives, QPMW first checks whether
[TABLE]
with the check being done using a thresholded version of the Laplace noise measurement from Section 1.1.  If the answer is yes, then QPMW simply returns as its estimate for , without measuring the actual quantum states any further.  Only if the answer is no does QPMW measure a second timeâboth to learn an accurate estimate for , and to use that estimate to update its hypothesis .  This second measurement also involves the deliberate addition of Laplace noise.
Intuitively, the reason why we might expect this to work is that each round of PMW leaks very little privacyâand by our central connection between DP and gentleness, that suggests that we can implement each round of QPMW in a way that damages the states very little. Â However, formalizing this requires substantial new ideas, which are not contained in the classical analyses of PMW.
Of course, if we had a sufficiently general theorem about privacy implying gentleness, then perhaps everything we needed would follow immediately from that theorem, combined with the privacy of PMW. Â However, our existing implicationâapplying, as it does, only to product measurements on product states, and saying nothing about adaptively chosen sequences of measurementsâwill force us to work harder.
The core difficulty concerns what, before, we called step (1) in the analysis of PMW: namely, the connection between loss in privacy and the probability of triggering an update. Â We note that while, by construction, the answer in each round is close to the answer on the current state in the registers, we need the answer to be accurate with respect to the original state . Â The algorithmâs gentleness plays an essential role in proving accuracy: itâs only because of gentleness that we know that the state in the registers hasnât been corrupted, and that the algorithmâs answers are accurate with respect to the original state. Â We further note that, since we want to handle many measurements, and the damage from these measurements will accumulate, we truly need to show that the overwhelming majority of measurements result in only negligible damage.
The original analysis [27] conditioned on so-called âborderline rounds,â which are rounds that have a reasonable probability of triggering an update, and argued that the privacy loss in other rounds was zero.  In the quantum setting, however, this is a non-starter: so long as there is some probability of an update, the damage is never zero.  Instead, we show how to bound the damage each no-update round would cause to the original state as a function of the probability that it could have triggered an update.  Thus, rounds that are likely to trigger an update (of which there are few) can cause damage, but rounds that are unlikely to trigger an update (of which there are many) each cause very little damage once we condition on âno update.â  Since the number of updates is bounded, this is a promising start.  Bounding the damage as a function of the probability of an update requires a delicate analysis, leveraging the differential privacy of the Laplace measurement and the fact that we have a product state in the registers, which induces a Gaussian distribution on the answers before noise is added to each measurement (see Claim 41).
In the classical setting, once we bound the privacy loss per round, we can apply composition theorems to bound the loss across rounds.  Crucially, this composition maintains multiplicative guarantees on the closeness of probabilities.  But damage to quantum states (in the trace distance metric, for example) is additive, not multiplicative.  Indeed, even if the amplitudes in a quantum state  were to change by only small multiplicative amounts, that could easily turn into an additive change when we rotate  to a different basisâa phenomenon with no classical analog.  So once  becomes even slightly corrupted, why doesnât this sever the multiplicative connection between damage and the probability of an updateâthereby preventing the necessary updates from happening, and allowing  to become corrupted even further, and so forth, until inaccurate answers are returned?
We address these worries using several tools.  The first is a âDamage Lemma,â Lemma 17, which tightly connects the probability of an update being triggered in the ârealâ world, where the state  is slightly damaged by each measurement round, to its probability of being triggered in the âidealâ world, where the algorithm gets a fresh copy of  at each round.  This lemma is quite general and might find other uses.  With this lemma in place, we divide the execution of the QPMW algorithm into epochs, where each epoch has a constant probability of triggering an update.  By the connection between damage and update probabilities, this means that the sum of the damage incurred by an âidealâ execution would be bounded, and by the Damage Lemma the total damage in the ârealâ execution remains bounded as well.  Since, moreover, each epoch triggers an update with constant probability, and the number of updates is bounded, the number of epochs will be bounded too.  This gives us a bound on the total damage to the state, and is crucial both for proving gentleness and for proving accuracy.
Other Results. Â The paperâs other results are proved using a variety of techniques. Â In Appendix 10, for example, we show that any measurement thatâs [math]-DP on product states (i.e., accepts all product states with the same probability) is actually [math]-DP on all states, and hence trivial. Â Though simple, this result makes essential use of the fact that the separable mixed states have positive density within the set of all mixed states, and would be false if amplitudes were reals rather than complex numbers. Â Since most results in quantum information are insensitive to the distinction between real and complex quantum mechanics, itâs noteworthy to find an exception.
To prove, in Appendix 13, a weak form of composition for quantum DP algorithms, we use the same âDamage Lemmaâ (Lemma 17) that we used for the analysis of QPMW.  In that appendix, however, we also construct an example, involving DP measurements in two incompatible bases, that shows why any composition theorem for quantum DP will come with caveats that werenât needed classically.
To prove, in Section 5, that our âDP implies gentlenessâ implications are asymptotically optimal, we use the Laplace noise measurement  as a separating example.  When , we get a measurement thatâs -DP, but not -gentle on arbitrary states for any .  When , we get a measurement thatâs -DP, but not -gentle on product states for any .
1.7 Related Work
To our knowledge, this paper is the first to make the connection between gentle measurement of quantum states and differential privacy. Â Nevertheless, there were two previous papers that tried to combine quantum information and differential privacy in other ways; there was a previous study of gentle tomography;Â and there was a celebrated (purely classical) connection between differential privacy and so-called adaptive data analysis, which in some ways foreshadowed our connection between DP and gentle measurement.
Quantum information and DP.  Senekane et al. [39] discuss first applying a classical DP algorithm to classical data, and then encoding the output into a quantum state for use in a quantum machine learning algorithm.  Naturally this composition preserves DP, but the DP and quantum aspects donât seem to interact much.
Zhou and Ying [46] define and study an interesting notion of âquantum DP,â which however is very different from ours.  Given an algorithm that takes a quantum state as input and produces another quantum state as output, they define to be -DP if for all states with trace distance at most , and all -outcome measurements ,
[TABLE]
In other words: unlike us, Zhou and Ying donât consider âs behavior on two databases that differ in a single entry (but which could have arbitrarily large trace distance)âonly on two states that are actually close as quantum states.  For them, essentially, a DP algorithm is a quantum channel that converts âmereâ closeness in trace distance into a stronger, multiplicative kind of closeness between quantum states.  Zhou and Yingâs main results are that
- (1)
the standard depolarizing and amplitude-damping channels (i.e., just adding noise to a quantum state, like in the simplest models of decoherence) are DP in their sense, and 2. (2)
their notion of quantum DP satisfies many composition theorems, including advanced composition.
These results are interesting and non-obvious, but only tangentially related to what we do.
Gentle tomography.  Bennett, Harrow, and Lloyd [12] studied the task of âgentle quantum state tomographyââthat is, recovering a full description of a quantum state from identical copies , without appreciably damaging the âs.  Their notion of âgentlenessâ was very similar to ours.  To achieve the task, they gave a protocol that, like many of our protocols, deliberately adds noise to the measurement outcomes before returning them (although they used a randomized binning strategy rather than Laplace noise).  They did not make a connection to differential privacy, and also did not consider shadow tomography, or any other tasks besides full tomography.
DP and adaptive data analysis.  Perhaps the work that most clearly anticipated ours, at a technical level, had nothing to do with quantum information at all.  Dwork et al. [19] studied the problem of adaptive data analysis: given a dataset, drawn i.i.d. from an underlying distribution, the goal is accurately to answer a long sequence of adaptively chosen statistical queries or analyses.  Each query can be chosen as a function of the answers to all previous queries.  Accuracy is measured with respect to the underlying distribution, rather than the specific dataset drawn, and the goal is to avoid overfitting.  A sequence of works [19, 18, 10] showed that differentially private mechanisms are particularly well-suited to this application, and can be used to guarantee adaptive accuracy automatically.
Let be the dataset, with entries drawn i.i.d. from a distribution .  A priori, before any queries are answered, an observerâs view of the dataset is that it is a draw from the distribution .  As queries are answered, this view might change.  One way to prevent overfitting is to guarantee that the query answers do not change the observerâs view much: i.e., that the a-posteriori view of âs distribution, conditioned on the observed answers, is almost unchanged.  This can be interpreted as âclassical gentleness.â  At a technical level, our results use the fact that in the above scenario, if we run a classical DP algorithm on the database , then conditioning on outputting any particular value results in a bounded change to the prior (see Lemma 30).  We note that a similar result follows from the work of Dwork et al. [18] and Rogers et al. [38] (their results are phrased in terms of the so-called âmax informationâ).
While there are technical and conceptual connections, the setting of quantum measurement or shadow tomography (even without gentleness) presents altogether different challenges from the adaptive data analysis setting. Â Most notably, as we discussed in Section 1.1, running an algorithm on a quantum state can collapse the state. Â This is a physical phenomenon, not just a change in a particular observerâs prior and posterior views as was the case classically. Â In particular, quantum measurements that collapse the state cannot be forgotten or undone. Â Restricting our attention to computing the average of two-outcome measurements over registers, this difference is best illustrated by the fact that, in the quantum setting, computing accurate answers to a large collection of non-adaptive measurements is already a challenging task (even without requiring gentleness). Â In the classical setting, on the other hand, if the measurements are specified non-adaptively then the naĂŻve algorithm that simply outputs the empirical mean for each measurement performs quite well; the only challenge is answering an adaptively specified sequence of measurements.
2 Preliminaries
2.1 Classical Probability Theory
Given two probability distributions  and , weâll use all three of the following measures of distance between them:
[TABLE]
Interestingly, Hellinger distance was invented in 1909, prior to the discovery of quantum mechanics, and is used for purely classical purposes in probability theory.  But, as it involves the square roots of probabilities, it might be said to have a âsecret affinityâ for quantum mechanics that occasionally reveals itself, as it will in this paper.
Proposition 10** (Pinskerâs Inequality)**
.
The following is less well-known, but weâll need it as well:
Proposition 11** (e.g. [37, p. 99])**
.
2.2 Quantum Information Basics
In the following sections, weâll briefly review some standard notation and definitions from quantum information. Â More details can be found, for example, in Nielsen and Chuang [34].
A -dimensional pure state is a unit vector in , which we write in ket notation as
[TABLE]
Here  is an orthonormal basis for , and the âs are complex numbers called amplitudes satisfying .  The state  is also called a superposition over the basis states , which we can think of as the possible classical states of the system.131313Note that any linear combination of the basis states , and not just themselves, is called a âpure state.â  We also denote by  the conjugate transpose of  (thus,  is a column vector while  is a row vector).  The unit-norm condition can then be written succinctly as ; and more generally, the complex inner product between  and  can be written .
In the special case , we call  a qubit, and typically label the orthonormal basis vectors by  and .  Itâs also convenient to give standard names to the following two superpositions of  and :
[TABLE]
The reader might be familiar with two types of operations that we can apply to pure states.  First, given any unitary matrix , we can map  to .  Second, we can measure  in the  basis.  Doing so returns the outcome  with probability .  Furthermore, the state  then âcollapsesâ to .
More generally, we could measure  with respect to any orthonormal basis , which is equivalent to first applying a unitary that maps each  to , then measuring in the  basis, and finally applying , where  denotes conjugate transpose.  This returns the outcome with probability , whereupon the state collapses to .  A measurement of this type is called a projective measurement.
2.3 Mixed States, Superoperators, Quantum Operations, and
POVMs
In general, we may have ordinary probabilistic uncertainty about which quantum superposition we have.  This leads us to mixed states, the most general kind of state in quantum mechanics.  Formally, a -dimensional mixed state  is a positive semidefinite matrix that satisfies .  Equivalently, is a convex combination of outer products of pure states with themselves (without loss of generality, at most pure states):
[TABLE]
where  and .  This can be interpreted as a probability distribution wherein each  occurs with probability , though note that different distributions can give rise to the same .  In the special case where  has rank , it represents a pure state (i.e., a superposition).  In the special case where  is diagonal, it represents a classical probability distribution over .  The maximally mixed state, where  is the identity matrix, corresponds to the uniform distribution over , and has the unique property of being unaffected by unitary transformations.
We can restate the basic rules of quantum mechanics in terms of mixed states, as follows.  First, a unitary transformation maps  to .  Second, a measurement of  in the  basis returns the outcome with probability , whereupon  collapses to .  Likewise, a measurement in the  basis returns with probability , whereupon  collapses to .
More generally, a superoperator , the most general (norm-preserving) mapping from mixed states to mixed states allowed by quantum mechanics, maps  to the mixed state
[TABLE]
where  can be any matrices satisfying
[TABLE]
Here  and  do not even need to have the same dimension.  Superoperators encompass unitary transformations, measurements, and other interactions with an external environment in a single formalism.
Even more generally still, if we have  as above where  only satisfy
[TABLE]
then we call a quantum operation.141414In the literature, these are also called ânon-trace-increasing completely positive maps.â  If is a quantum operation, then  is Hermitian and positive semidefinite, but it might not be a normalized mixed state, because its trace might be less than .  Quantum operations are useful for capturing the effects of superoperators when we additionally condition on some event happening (e.g., a measurement outcome being âacceptâ).  The eventâs probability is then , and the final mixed state conditioned on the event is .
Quantum operations act linearly on mixed states, in the sense that
[TABLE]
Although any measurement can be represented by a superoperator, when discussing measurements itâs convenient to use a related formalism called âPOVMsâ (Positive Operator Valued Measures).  POVMs capture all measurements allowed by quantum mechanics, including those whose implementations might involve ancillary systems besides the ones being measured.  In this formalism, a measurement is given by a list of  positive semidefinite matrices , which satisfy .  The rule is:
[TABLE]
Importantly, specifying the âs doesnât uniquely determine the post-measurement states (i.e., what happens to  if the outcome is ).  Thus, by an implementation of the measurement , in this paper weâll mean a list of matrices , which satisfy .  For a given implementation, if the measurement outcome is , then the post-measurement state is
[TABLE]
Note that the mapping
[TABLE]
is a superoperator, that each individual mapping  is a quantum operation, and that is the probability of outcome .
In the special case of two-outcome POVMs , weâll sometimes identify the POVM itself with the âacceptâ outcome , treating the ârejectâ outcome  as implied.
2.4 Separable and Entangled
A pure state  on registers is called a product state if it can be written as a tensor product,
[TABLE]
Any pure state that cannot be so written is called entangled. Â A famous example of an entangled pure state is the Bell pair, .
A mixed state  is likewise called a product state if it can be written as a tensor product
[TABLE]
Also,  is called separable if it can be written as a convex combination of product states, and entangled if it canât be.  Unlike a pure state, a mixed state can be separable but non-product, meaning that it has classical correlations but no entanglement, as with the example (i.e.,  and  with equal probabilities).
A measurement on an -register state is called product if there exist POVMs  such that can be implemented as follows:
- âą
For each , apply  to the register.
- âą
Return some function of the classical measurement outcomes, possibly together with auxiliary randomness.
In the special case where  are all projective measurements, we call a product-of-projectives.
More generally, we call mixture-of-products if the POVMs  can be chosen randomly, from some correlated probability distribution, in advance of applying them.
More generally still, we call LOCCâthe acronym stands for Local Operations and Classical Communicationâif can be implemented by applying a POVM to some register , then (depending on the outcome) applying another POVM to some register , and so on, then finally returning some function of the classical measurement outcomes, possibly together with auxiliary randomness. Â Here we allow any finite, adaptively chosen sequence of POVMs, which could include repeated POVMs applied to the same register.
Let us stress that, even if a measurement happens to be product, or mixture-of-products, or LOCC, if we want to implement the measurement gently, we might need to apply a quantum circuit that acts on all registers coherently.  This is because, if we measure the registers separately, weâll generate garbageâi.e., information about the state besides the final measurement outcomeâthat might destroy gentleness.  Only if weâve taken care to do everything in coherent superposition, simulating the âmeasurementsâ on the individual registers (and the computations on the outcomes of those measurements) using ancilla qubits, can we later uncompute the garbage.  This is likely to be a significant challenge for experimental implementation of gentle measurements like the ones discussed in this paper, since coherent measurements across registers are much harder than incoherent ones to realize in practice.  On the other hand, this issue makes no difference for DP, since even if the garbage isnât uncomputed, it need not be revealed to the end user.151515Or to say it another way, the definition of quantum DP talks only about the probabilities of outcomes, not about the post-measurement states.
2.5 Distance Between Quantum States
Given a Hermitian matrix , its trace norm is defined as
[TABLE]
where  are the eigenvalues of .  In particular, given two mixed states  and , their trace distance is defined as .
Trace distance is a metric on mixed statesâi.e., itâs reflexive, symmetric, and satisfies the triangle inequality. Â Itâs equal to
[TABLE]
where the maximum is taken over all possible two-outcome measurements .  As such, trace distance generalizes the total variation distance between classical probability distributions, reducing to the latter when  and  are both diagonal matrices.
Weâll find the following facts useful.
Proposition 12** (Contractivity of Trace Norm [34, p. 406])**
Let  be any quantum operation, and let  be a Hermitian matrix.  Then
[TABLE]
So in particular, for any two mixed states  and and any quantum operation , we have
[TABLE]
As an especially useful example, a superoperator that âtraces outâ (discards) part of its input state can never increase trace distance.
Proposition 13** (Convexity of Trace Norm)**
For all Hermitian matrices and ,
[TABLE]
The triangle inequality for trace distance is just a special case of the above. Â As another useful special case, for all mixed states and probabilities ,
[TABLE]
Finally, trace distance  takes an especially simple form if  and  are both pure states.
Proposition 14
For all ,
[TABLE]
2.6 Additivity of Damage
In this section, we prove the extremely useful fact that, if we apply quantum operations to a quantum state in succession, then we can bound the total damage caused to  in trace distance by the sum of the damages that each operation would cause were it applied to  individually.  This fact is related to the so-called âQuantum Union Boundâ (see [3, 45]), but itâs both simpler to state and easier to prove.
Lemma 15
Let  be a mixed state, and let be any quantum operation.  Suppose , and let .  Then .
Proof. We have
[TABLE]
Here the second line used the linearity of quantum operations, and the third used the triangle inequality for trace distance as well as Proposition 12 (i.e., the fact that applying a quantum operation can never increase the trace norm).
Lemma 15Â has the following immediate corollary.
Corollary 16
Let be a mixed state and let be quantum operations. Â Suppose that for all , we have
[TABLE]
Then
[TABLE]
Proof. Suppose by induction on that
[TABLE]
Then
[TABLE]
by Lemma 15.
Corollary 16 is the reason why âgentleness composesâ: that is, applying an -gentle measurement to a state , followed by an -gentle measurement, yields an overall -gentle measurement. Â By contrast, itâs not clear to what extent DP composes in the quantum setting, because of the interaction between the DP requirement and damage to the state. Â For more about this issue see Appendix 13.
Note that, by simply specializing Corollary 16 to diagonal  and classical operations , we obtain an analogous statement for classical variation distance.
In Section 6, when we analyze our shadow tomography protocol, weâll also need a lemma that upper-bounds the damage caused by a sequence of measurements conditional on the measurements all acceptingâor equivalently, by a sequence of quantum operations where we normalize the final result. Â Fortunately, the formalism of quantum operations and trace norm can accommodate this case as well.
Lemma 17** (Damage Lemma)**
Let be a mixed state.  For all , let  be a quantum operation, which âacceptsâ a state  with probability , and yields the post-measurement state when it does.  Suppose that for all , we have
[TABLE]
Let  be the probability that  accepts the âidealâ state , and let
[TABLE]
be the probability that accepts the state that it actually receives, if  are first applied to  and if we condition on their accepting.  Given any subset , let
[TABLE]
Then for all ,
[TABLE]
Also,
[TABLE]
Proof. For all , let
[TABLE]
Then by hypothesis, . Â Also,
[TABLE]
We can now write:
[TABLE]
and so on until
[TABLE]
More generally, suppose we define
[TABLE]
so that
[TABLE]
is just the probability that  accepts for all , if are applied in sequence.  Then repeating the manipulations above gives us the following modified equation, in which all the products of âs are restricted to range only over :
[TABLE]
Hence
[TABLE]
where the second line used the fact that all the products of âs are upper-bounded by . Â This means that
[TABLE]
thereby proving the first part of the lemma.
For the second part, let us take the special case . Â Then , and the inequalities above reduce to
[TABLE]
So the triangle inequality gives
[TABLE]
Hence
[TABLE]
As weâll show in Appendix 13, Lemma 17 implies a limited sort of composition for quantum DP algorithms. Â Namely, we can sequentially compose quantum DP algorithms and have the result remain accurate and DP, so long as the total damage incurred to the quantum state (in trace distance) is always small compared to the joint probability of the observed outcomes . Â We can sometimes ensure the latter property, in turn, by using our main result, the connection between DP and gentleness.
Note that we can combine Lemmas 17 and 15, to say that, even if we apply a final superoperator  after applying the quantum operations  and then conditioning on their results, the total damage to our initial state  is at most  plus the damage bound from Lemma 17.  (This wouldnât be true if weâd composed in the opposite order, since conditioning could amplify earlier damage to  by an  factor.)  This fact will also be used in Section 6.
2.7 Pure vs. Mixed States
We now prove two propositions to show that, when considering differential privacy and gentle measurements, we can restrict attention to pure states without loss of generality; our conclusions will then automatically carry over to mixed states.
Proposition 18
If is -DP on pure product states, then is -DP on mixed product states as well.  Likewise, if is -DP on all pure states, then is -DP on all mixed states.
Proof. Suppose we seek to maximize the ratio
[TABLE]
over product states  and  that differ only on the  register.  Then holding the other  registers fixed, weâre maximizing over  and minimizing over .  By convexity, the maximum and minimum will both always be achieved by pure states.  A second appeal to convexity then shows that the maximum ratio is also achieved when the other  registers are set to pure states as well.
For the second part, the argument is the same, except that we simply maximize  over all , and minimize  over all .
Proposition 19
If the measurement is -gentle on pure product states, then is -gentle on mixed product states as well.  Likewise, if is -gentle on all pure states, then is -gentle on all mixed states.
Proof. Fix an implementation of ; the same implementation that achieves gentleness on pure states will also achieve gentleness on mixed states.
Suppose we apply to the product state .  As a first step, we can purify  to  respectively by adding registers to them.  Then can be seen as acting on the pure state , and simply ignoring these added registers.  By assumption, after we apply and condition on some outcome , weâre left with a post-measurement state  such that
[TABLE]
Likewise, let  be the post-measurement state if we apply to  and then condition on outcome .  Observe that  is also the mixed state obtained by starting from  and then tracing out the added registers.  So by Proposition 12, we have  as well.
For the second part, the argument is the same, except that we purify  as a whole rather than  separately.
3 Basic Relations Among DP, Gentleness, and Triviality
In this section, we prove our first connection between the differential privacy and the gentleness of quantum measurements:
Theorem 20
If a measurement is -DP on all states, then is -gentle on all states.  Conversely, if is -gentle on all states for , then  is -DP on all states.
Unfortunately, Theorem 20 is weaker than it might look, since as weâll see, it relates DP to gentleness only in a regime where is ânearly trivial.â  Later, weâll restrict our attention to product states, which will lead to a much more interesting connection between DP and gentleness.  Nevertheless, Theorem 20 serves as an instructive warmup to our main results, and the tools used to prove it will later be reused.
Note that all the results in this section also have classical analoguesâwe simply need to replace âall (mixed) statesâ by âall probability distributionsâ in each definition and statementâand those classical analogues might be of interest as well.
Letâs first define formally what we mean by a measurement being ânearly trivial.â
Definition 21** (Triviality)**
Given a set of mixed states, a measurement , and a parameter , we say is -trivial on if for all states , and all possible outcomes of , we have
[TABLE]
For to be -trivial, full stop, means that is -trivial on the set of all states.
In particular, is [math]-trivial if and only if âs output probabilities are completely independent of . Â Note also that -trivial immediately implies -DP. Â Like -DP (but unlike -gentleness), the definition of -triviality depends only on the outcome probabilities, and not on the post-measurement states.
The following proposition gives a slightly weaker condition that already suffices for a measurement to be -trivial.
Proposition 22
Given a measurement and parameter , suppose that for every two orthogonal pure states and , and every possible outcome of , we have
[TABLE]
Then is -trivial.
Proof. Let  be the POVM elements of .  Assume without loss of generality that the outcome corresponds to the element .  Then by assumption,
[TABLE]
for all orthogonal . Â But this means that all of âs eigenvalues must be within an multiplicative factor of each other. Â So (5) holds for all , not just all orthogonal . Â By convexity, we then have
[TABLE]
for all  as well.
Using Proposition 22, we now show that gentleness on all states implies near-triviality.
Lemma 23
Suppose is -gentle on all states. Â Then is -trivialâso in particular, -trivial, provided .
Proof. Given mixed states  and , letâs first consider the special case where and  are perfectly distinguishable (that is, ).  For any outcome , let  and , and assume without loss of generality that and .  Also, fix an -gentle implementation of .  Let  and  be the post-measurement states for and respectively, if the outcome of applying is .  Now consider the mixed state .  Its post-measurement state is
[TABLE]
So let . Â Then
[TABLE]
So by the triangle inequality,
[TABLE]
since  and  and  are all at most  by our gentleness assumption.  Furthermore, by assumption, .  Thus we simply get .  Or
[TABLE]
And by Proposition 22, if the above holds for perfectly distinguishable states  (so in particular, for orthogonal pure states), then it holds for all as well.  Hence is -trivial.
An immediate corollary of Lemma 23 is this:
Corollary 24
If is -gentle on all states, then  is -DP on all states.
Indeed, since the reasoning applied independently to each measurement outcome , we get the following stronger conclusion, which will be useful when we analyze shadow tomography:
Corollary 25
If is -gentle on all states, then  is -DP on all states.
Notice that the central gambit in the proof of Lemma 23, namely defining , generally maps product states to non-product states. Â It turns out that this is inherent: Lemma 23Â does not have an analogue that assumes only gentleness on product states. Â Or rather: if we assume only gentleness on product states, then we can deduce DP (and will do so, in Lemma 28), but will not be able to deduce triviality. Â And this is to be expected, since there are nontrivial DP algorithms, and indeed our main result (Theorem 5) shows that these algorithms lead to measurements that are gentle on product states.
We next prove a converse to Lemma 23: that near-triviality implies gentleness.
Lemma 26** (TrivialGentle)**
Suppose  is -trivial.  Then is -gentle on all statesâso in particular, -gentle, provided (say) .
Proof. Again, let  be the POVM elements of , and recall that we can use any solutions to the equations  to define the possible post-measurement states after is applied.  Without loss of generality, focus on and .
Since is -trivial, all of âs eigenvalues must be within an multiplicative factor of each other. Â Also, since is Hermitian, we can diagonalize it as , where is unitary and is a diagonal matrix of âs eigenvalues. Â Letâs make the choice . Â Then for some constant , we can write as , where , and is a diagonal matrix whose entries are all at most in absolute value.
Let  be a pure state to which is applied, and assume .  Then conditioning on outcome leads to the post-measurement state
[TABLE]
Therefore the post-measurement state is
[TABLE]
By Proposition 14, the trace distance between this state and  is at most the Euclidean distance, which in turn is at most
[TABLE]
Thus, weâve given an implementation of that is -gentle on pure states. Â By Proposition 19, this implies that is -gentle on mixed states as well.
Finally, we prove that if a measurement is -DP for sufficiently small , then itâs nearly trivial.
Proposition 27** (Sufficiently DP Is Trivial)**
If is -DP on all states, then is -trivial on all states.
Proof. Let be any mixed states on registers.  Also, let  be a superoperator that simply swaps out the  register for some fixed stateâsay the maximally mixed state , if the registers are -dimensional.161616Or if we preferred unitary transformations, we could also achieve the same effect by (for example) applying a Haar-random unitary to the  register, and then appealing to convexity.  Then by applying all of the âs to  or , one at a time, we can map the entire input state to .  Thus, for any output possible  of , if we repeatedly invoke the fact that is -DP, once for each , we find that
[TABLE]
Likewise,
[TABLE]
Hence
[TABLE]
One can show, by a similar argument, that if  is -DP on product states, then is -trivial on product states.  Again, though, this is only interesting in the regime , whereas our results in Section 4 will be able to handle measurements that are -DP on product states for  up to about .
Combining Lemma 23, Lemma 27, and Lemma 26 now completes the proof of Theorem 20.
Again, the problem with Theorem 20 is that, while it relates the privacy of a measurement  to its gentleness, it does so only as an âaccidental byproductâ of showing that sufficiently private and sufficiently gentle measurements are both nearly trivial.  To get a more interesting connection between privacy and gentleness, weâll need to restrict our attention to product states, as our main result (Theorem 5) does.
4 Proof of Main Result
In this section we prove Theorem 5, the two-way connection between gentleness and differential privacy on product states. Â Unlike Theorem 20, this connection will work even for measurements that are very far from trivial.
4.1 Gentleness Implies DP on Product States
Weâll start by proving the âeasyâ direction: that gentleness on product states implies differential privacy on product states.  For this, we can reapply Lemma 23 from the previous section.
Lemma 28** (GentlenessDP on Product States)**
If is -gentle on product states, then is -DP on product states as wellâso in particular, -DP on product states, provided . Â Likewise, if is -gentle on product states then is -DP on product states.
Proof. Let  and  be two product states that differ only on the  register.  Also, fix an implementation of that is -gentle on product states.  Then for any outcome , let  and  be the post-measurement states for  and  respectively assuming that returns outcome , and let  and  be the restrictions (i.e., partial traces) of  and  respectively to the  register.  Then by Proposition 12, together with the assumption of -gentleness, we have
[TABLE]
and likewise
[TABLE]
But now we can apply Lemma 23âwhich implies that, if we think of as acting on the  register only, with the other  registers held fixed, then  must be -trivial.  Moreover, the preceding statement holds for all , and all settings of the other  registers.  But thatâs simply another way of saying that  is -DP on product states.
The last part follows simply because the argument applies for each possible output  independently.
This proves part (1) of Theorem 5.
Note that as  approaches , the bound on the DP parameter diverges.  Certainly the DP parameter needs to diverge as  approaches , since (for example) measuring a single qubit in the basis, outputting the result, and then replacing the qubit by the maximally mixed state is -gentle but preserves no privacy whatsoever.  We leave it as an open problem to close the gap between  and .
4.2 DP Implies Gentleness On Product States
For the other direction, weâll proceed in stages. Â Weâll start by providing that -DP implies -gentleness for classical product distributions. Â Later weâll extend this result to the quantum setting.
Weâll need a claim, originally proved in [22], thatâs found many uses in classical DP.
Claim 29** ([22])**
Suppose two probability distributions, Â and , satisfy
[TABLE]
for all . Â Then the KL-divergence,
[TABLE]
satisfies .
We can now prove a classical âDP implies gentlenessâ result.  As noted previously, similar results follow from the work of Dwork et al. [18] and Rogers et al. [38] (phrased in terms of the so-called âmax informationâ), but we provide a self-contained proof.
Lemma 30** (Classical DPGentleness)**
Let be a classical -DP algorithm, and let be a product distribution over databases .  Then for all possible outputs  of , the posterior distribution  satisfies , and indeed the stronger bound .
Proof. Fix any output .  We want to compare the prior distribution  over databases to the posterior distribution , which is obtained by conditioning on the event .  To do this, consider a process wherein we draw a database  from , by first drawing  from the marginal distribution over the first entry conditioned on , then drawing  from the marginal distribution over the second entry conditioned on  and on , and so on up to .
Letâs call the distribution above ; note that depends both on and on .  These are distributions over , our âdata universe.â  We claim that, for every possible value  for , the log-ratio between âs probabilities under  and under  must be upper-bounded in magnitude by .  To show this, let and .  Then
[TABLE]
Here the second and last lines used the assumption that is a product distribution. Â Also, by differential privacy,
[TABLE]
for all .  Therefore convex combinations of the above probabilities are also within an  multiplicative factor of one another, so
[TABLE]
By Claim 29, this means that the expected log-ratio between  and , with respect to  drawn from , is upper-bounded by :
[TABLE]
Furthermore, the expected sum of the log-ratiosâi.e., the KL-divergence between  and  themselvesâis just the sum of the expected log-ratios:
[TABLE]
where the expectations here are over the choices for the âs (which, however, are irrelevant to the upper bound). Â So by Pinskerâs inequality (Proposition 10),
[TABLE]
Having shown that -DP implies -gentleness for classical product distributions, we now begin the task of extending the result to quantum product states.
Lemma 31
Suppose the measurement is -DP on product states, and is a product-of-projectives (i.e., consists of a classical algorithm applied to the outcomes of nonadaptive projective measurements on the registers). Â Then is -gentle on product states.
Proof. By Proposition 19, it suffices to give an implementation of that is -gentle on pure product states. Â Thus, let
[TABLE]
be a pure product state on registers.  By applying suitable local unitaries, we can assume without loss of generality that simply measures each  in the computational basis, obtaining the string  with probability .  It then outputs a sample from some probability distribution , depending on , over the possible outputs .  We need to show how to sample from  in an -gentle manner.
Our implementation is as follows: first map the state  to
[TABLE]
Note that, as long as we do not care about computational complexity, the above mapping can always be implemented somehow, although implementing it efficiently requires an efficient algorithm for âQSamplingâ the probability distributions .  Next, measure the register in the computational basis, and condition on getting some particular result .
Then by the rules of quantum mechanics and Bayesâ rule, the state of the first register is just
[TABLE]
Let  be the distribution over  defined by ; note that  is a product distribution, to which Lemma 30 applies.  Also, let be  conditioned on the event that outputs .  Then we see above that is a pure state that precisely corresponds to âin the sense that, if we measure in the computational basis, weâll see a sample from .  The one complication is that has an additional set of degrees of freedom, namely the unit-norm phases .  However, even these phases go away when we calculate the inner products (which involve complex conjugates).  In more detail:
[TABLE]
Here  is the squared Hellinger distance between the probability distributions  and (see Section 2.1).  So in particular, strictly relates the distributions and , and has nothing further to do with quantum mechanics.
We can now upper-bound the trace distance between  and âand hence, the gentleness of on âby
[TABLE]
Here the first line used Proposition 14, the second-to-last line used Proposition 11, and the last line used Lemma 30.
Note that, if weâd upper-bounded the Hellinger distance by the square root of the variation distance, , weâd only get an upper bound of , rather than the  that we wanted.  To avoid that loss, here we exploited the fact that Lemma 30 upper-bounded the KL-divergence rather than only the variation distance, and we also used Proposition 11, which upper-bounds Hellinger distance directly in terms of KL-divergence, bypassing variation distance.
We now prove the final lemma needed to complete the proof of Theorem 5, by generalizing Lemma 31Â from projective measurements to POVMs.
Lemma 32
If is any product measurement that is -DP on product states, then is -gentle on product states.
Proof. Again, by Proposition 19, it suffices to restrict attention to pure states. Â We will give a reduction to the situation already handled in Lemma 31. Â Suppose we start with the product state
[TABLE]
Next we apply a POVM to each . Â This can be modeled as follows: for each , we apply a unitary transformation to together with some ancilla qubits that are initially in the state . Â This yields a new state that we can write as
[TABLE]
Here represents a classical computational basis state that the POVM will measure, while represents âgarbageâ: some normalized state that depends only on and , need not be in the computational basis, and will not be measured.
So now we have
[TABLE]
Next, we apply our classical algorithm to the basis states , and then we condition on the algorithm outputting . Â This yields a new state . Â What can we say about the relation between and ?
Letâs reorganize by collecting  into a single register that weâll call , and also collecting all the âs into a single register that weâll call .  We then have:
[TABLE]
where , and is just the probability of from the perspective of the classical algorithm. Â By exactly the same reasoning as in the proof of Lemma 31, it follows that
[TABLE]
Therefore,
[TABLE]
So now we have exactly the same expression for the inner product that we had in the proof of Lemma 31.  So we can use the same argument to lower-bound the inner product by , and to upper-bound both the Hellinger distance and the trace distance between  and  by .
Finally, recall that was obtained by applying a unitary transformation  to together with some ancilla qubits.  Since inner products are unitarily invariant, this means that also has trace distance at most from .  Hence weâve implemented in an -gentle manner.
Intuitively, whatâs going on is that the garbage register, , is completely inert: itâs there, but it has no effect on the inner product.
Combining Lemma 28Â with Lemma 32 now completes the proof of Theorem 5.
5 Separating Examples
In this section, we prove that the relationships between DP and gentleness notions proved in the preceding two sections are essentially tight, by giving examples of measurements that exhibit their optimality.
5.1 Gentleness to DP
For all , let  be the ârandomized responseâ algorithm, which for each separately, applies the POVM defined by the matrices
[TABLE]
to the qubit and returns the result.  In other words, the output of  is an -bit string, whose  bit has a bias of  toward the value of the  qubit in the  basis.  The following is immediate:
Proposition 33
* is -DP for , which is  for , and is not -DP for any .*
Proof. Flipping the  input bit can at worst change the probability that the  output bit assumes some value from  to  (or vice versa), while leaving the other  output bits unchanged.
We also have:
Proposition 34
Suppose  (i.e., there is just one qubit).  Then is -gentle.
Proof. Given a qubit in state
[TABLE]
here is one way to implement : with probability , return [math] or  with equal probabilities.  With probability , measure in the  basis and return the result.  Suppose without loss of generality that  and we condition on the output being .  Then the post-measurement state is
[TABLE]
The trace distance, , can thus be calculated explicitly as
[TABLE]
Combining Propositions 33Â and 34, we get the following corollary:
Corollary 35
For all , there exists a measurement that is -gentle on arbitrary states, but not -DPÂ for any , even on product states.
Proof. Consider  applied to the first qubit only.
This shows that Corollary 24 and Lemma 28 are both tight, up to the factor of in front of the .
5.2 DP to Gentleness
We now prove that, when we showed that -DP on arbitrary states implies -gentleness on arbitrary states (Proposition 27), and that -DP on product states implies -gentleness on product states for product measurements (Theorem 5), the and  factors were both asymptotically tight.
Recall the measurement  from Sections 1.1 and 1.3, which takes as input an -qubit state and returns the total Hamming weight, plus a Laplace noise term  of average magnitude .  We showed, in Proposition 4, that is -DPâand moreover, on all -qubit states, not merely on product states.  By contrast, we now observe that  is far from gentle on arbitrary states:
Proposition 36** (Optimality of Factor)**
* is not -gentle on -qubit states.*
Proof. We consider  applied to the mixture
[TABLE]
Note that the entire situation is classical, so the question of how  is implemented is irrelevant.  Let the measurement outcome be ; then
[TABLE]
So by Bayesâ rule, the post-measurement state is
[TABLE]
Suppose . Â Then we can calculate:
[TABLE]
If we now make the choice (say) , we find that this exceeds .
It follows that, in going from DP on arbitrary states to gentleness on arbitrary states, we need at least a factor of blowup in ; indeed this is true even for product-of-projectives measurements. Â Hence Proposition 27Â is essentially tight.
Likewise, in going from DP on product states to gentleness on product states, we need at least a factor of  blowup in , and this is true even for product-of-projectives measurements.  Hence Lemma 31 is essentially tight.  The example that shows this is again , albeit this time with :
Proposition 37** (Optimality of  Factor)**
* is not -gentle on -qubit product states, for any .*
Proof. Let , and consider applied to the uniform distribution .  Again, since the entire situation is classical, the question of how  is implemented is irrelevant.  Let the measurement outcome be ; then by Bayesâ rule, the post-measurement state is
[TABLE]
So suppose , and assume without loss of generality that is odd. Â Then we can calculate:
[TABLE]
6 Shadow Tomography
Having developed the connection between DP and gentleness, weâre now ready to apply the connection to shadow tomography.  First, in Section 6.1, we review a recent algorithm of Aaronson et al. [7] for online learning of quantum states, which weâll need as a central ingredient.  Then, in Section 6.2, we present and analyze our new Quantum Private Multiplicative Weights (QPMW) algorithm, which builds on the Private Multiplicative Weights (PMW) algorithm of Hardt and Rothblum [27].  QPMW proves Theorem 9: that is, it shows that itâs possible to do shadow tomography using only  copies of an unknown mixed state , where  is the number of known accept/reject measurements, is the dimension of , and  is the accuracy with which we want to estimate each measurementâs acceptance probabilityâin a way that, moreover, is online (i.e., processes the measurements one at a time) and -gentle (i.e., damages the copies of  by at most  in trace distance).
6.1 Online Learning of Quantum States
Aaronson et al. [7] recently defined and studied the problem of online learning of quantum states.  Here we have an unknown -dimensional mixed state , and a learner is presented with a sequence of two-outcome POVM measurements.  For each measurement , the learner tries to anticipate , the probability that  accepts , up to accuracy .  Indeed, the learner maintains a âhypothesis stateâ , and on each measurement , if the hypothesis differs appreciably from the unknown state with respect to this measurementâthat is, if
[TABLE]
âthen we say that the learner was âwrong,â and we allow it to update its state by giving it an approximation to the correct answer, where (say) .  The learnerâs goal is to upper-bound the total number of times that itâs ever wrong, even assuming that the sequence of âs and âs is chosen adaptively, by an adversary who sees the learnerâs hypotheses.
Perhaps surprisingly, Aaronson et al. [7] showed that the total number of mistakes can be upper-bounded by âso for example, only  for a state of qubits (even though the state space has dimension ).
We observe that the same bound holds even under a slight relaxation of the update condition:Â namely, updates can also be triggered when the hypothesis has error between and . Â If an update is triggered, then the learner again receives an -approximation to the correct answer.
Theorem 38** (Variant: Online Learning of Quantum States [7, Theorem 1])**
There is an explicit procedure for online learning of quantum states that makes at most updates, so long as updates never occur when the hypothesis has error smaller than , and updates always occur when the error is or larger.
We emphasize that when the error is in the range , updates may or may not occur.
Aaronson et al. [7] actually gave two explicit procedures that achieve the above bound: one based on online convex optimization, the other on matrix multiplicative weights.  Both procedures use an amount of computation per measurement thatâs polynomial in .
In this work, however, weâll be able simply to use Theorem 38 as a black box. Â Weâll view an online learning procedure as specified by its initialization procedure, which outputs an initial hypothesis state , and an update procedure used to update the hypothesis state .
6.2 Online Shadow Tomography
Our Quantum Private Multiplicative Weights (QPMW) algorithm for gentle online shadow tomography is presented in Figure 1.
Theorem 39
Let be gentleness and accuracy parameters. Â There exists a setting for the noise magnitude for which the online shadow tomography algorithm presented in Figure 1 is -gentle. Â Moreover, given sufficiently many copies , where
[TABLE]
the algorithmâs error is bounded by with probability at least  over its coins and its measurements.
Proof. We first prove gentleness and then turn our attention to bounding the error (the accuracy proof builds on the algorithmâs gentleness).
Gentleness. Â Note that we argue gentleness for any product state provided as input (i.e., for gentleness, we donât assume that the input is copies of a single state). Â By Proposition 19, it suffices to consider the case where the input is a pure product state . Â It is straightforward to see that the update rounds are gentle: we run two DP measurements in each update round, and their outcomes are gentle by Theorem 5. Â This is stated below in Claim 40. Â The non-update rounds are certainly no less gentle than the update rounds (after all, we only run the first measurement), but we expect to have a very large number of no-update rounds, and so we need a much better bound. Â We obtain such a bound by restricting our attention to the damage that can be caused by the conditioned superoperator , conditioned on the output being [math] (no update). Â One important challenge is showing that the damage (conditioned on this particular outcome) is tightly related to the probability of an update. Â Thus, it will be highly unlikely for a sequence of rounds (even a very long sequence!) to cause significant damage before it triggers an update. Â The second challenge is bounding the damage that can be caused by a sequence of conditioned superoperators. This is done via a delicate accounting argument, which relies on Lemma 17
We begin by fixing some notation.  First, given a superoperator and a fixed output , we use the term conditioned superoperator to refer to running the superoperator conditioned on the output being .171717In the terminology of Section 2.3, a conditioned superoperator is a quantum operation but where we normalize the output state.  The QPMW algorithmâs output in any run can be specified by , the number of rounds before an âabortâ (if any) occurs, and by a vector of outcomes, where for each , the outcome in round is .  In no-update rounds the outcome is , while in update rounds the outcome is the noisy answer returned by the algorithm.  Note that and the vector of outcomes indeed specify all outputs of the algorithm.  For an intermediate round , we can also consider the vector of outcomes in the first rounds.  Taking to be the initial state of the algorithm, we take to be the state after round , conditioned on the outcomes (and given the measurements ).  The initial state is thus , and the final state is .
Consider an execution of the algorithm at the beginning of the round.  The outcomes in previous rounds are given by , which determines the learned state .  Let be the measurement.  We define to be the probability that the measurement returns , i.e. the probability of an update on the original state .  Similarly, we take to be the probability that returns , i.e. the probability of an update on the real state in the registers at the beginning of the round.
The following claims bound the damage that can occur if we run the round with a fresh copy of the original state in the registers.
Claim 40
Every round of the algorithm is an -gentle superoperator.
Claim 41
Take and to be set as in Equations 11 and 10.  Let be the state after we run , and condition on the output [math] (âno update,â which occurs with probability ).  We have:
[TABLE]
Claim 40 follows immediately from the differential privacy of the Laplace measurement and from Theorem 5. Â We defer the proof of Claim 41, which is technically involved and lengthy (see below). We first show that, given this claim, the algorithm (taken as a whole) is gentle.
Epoch superoperators. Â For the analysis, we divide an execution of the algorithm into epochs, where each epoch is comprised of one or more rounds. Â The epoch begins in round (where ). Â The epoch ends on the first round where one of the following occurs:
- (1)
An update happens (or the epoch reaches the last round ). 2. (2)
The probability of an update, if each round was run on the original state, becomes too large:
[TABLE]
Naturally, the last epoch always ends on the last round . Â The crux of the gentleness analysis is bounding the damage done to the state within any single epoch. Â A separate argument shows that the number of epochs cannot be too large.
Viewing each epoch as a superoperator, it is specified by a list of measurements that would be chosen so long as no updates occurred.  Note that this list is indeed fixed: while the strategy that chooses the actual measurements can be adaptive, it specifies a fixed sequence of measurements (known in advance) that will be chosen so long as the outputs are âno update.â  Let be the first round that meets Condition (6).  The epoch processes the list of measurements until an update occurs (or the last measurement in this list is processed).  Note that depends on the initial state , but it is fixed in advance.  Given the list of measurements , the output of the epoch superoperator is the length a list of âno updateâ decisions of length , followed (if an update occurs in the final round) by the output of the Laplace measurement used to approximate the value of .
We bound the damage that can be caused to the original (product) state by running the epoch superoperator. We also show that running the epoch superoperator on triggers an update with constant probability, but with constant probability no update occurs before round .
Claim 42
There exists a noise magnitude such that the following holds. Â Fixing any round , prior measurements , and a history of outputs in the previous rounds, define the epoch superoperator as above. Â Then:
- (1)
When we run the epoch superoperator on the state , the probability that an update occurs is at least . 2. (2)
When we run the epoch superoperator on the state , the probability that no update occurs before the round is at least . 3. (3)
Let be the state in the registers after running this superoperator on the original state (including observing the epochâs outputs). Â The damage is bounded by:
[TABLE]
Proof. For any possible last round , and any possible output of the epoch superoperator (comprised of a sequence of no-updates, which may or may not end with an update), we bound the damage as follows. Â We take to be the round on which the epoch always ends (unless there is an earlier update). Â Since Condition (6) did not hold at the beginning of round , we have:
[TABLE]
Using the fact that for any , we have that :
[TABLE]
By taking logarithms on both sides of the above inequality we get:
[TABLE]
Claim 41 gives a bound on the damage when running the conditioned superoperator (on the original state), conditioned on output [math]. Â Recall that this bound is linear in the update probability . Â Claim 40 gives a bound on the damage caused by the conditioned superoperator run in the last round, conditioned on any possible outcome in that round. Â Combining these bounds with inequality (7), we get:
[TABLE]
Define to be the probability of no update in rounds in a ârealâ execution of the epoch superoperator on the state (and note that , because we are considering an output that can actually occur).  Applying Lemma 17 to the conditioned superoperatorâs run in the first rounds, and using also the bound on in the claimâs statement, we get:
[TABLE]
which in particular implies that , proving item (2) above.  By Lemma 17 (see also the remark following that lemma about composing with a final superoperatorâin our case, the round), we conclude that:
[TABLE]
Bounding the update probability. Â To lower-bound the probability of an update, observe first that if the probability of an update in the last round, when we run it on a fresh copy of the state , satisfies , then by gentleness of the epoch superoperator as a whole (see above), when we run it on , the probability of an update in the last round (run on the state ) is greater than .
Thus, we restrict our attention to the case that .  Since is the first round where Condition (6) is violated, we know that .  I.e., we have a lower bound on the probability of an update if each round was run on a fresh copy of .  Since we assume , we in fact have an upper bound on the probability of no update in the first rounds of such an execution:
[TABLE]
By equation (8) (restricted to the case ), we deduce a similar bound on the probability of no update in the first rounds of the actual execution (an execution that does not get fresh copies of ).  In particular, the probability of no update in this âactualâ execution is at most .
Accumulated damage. Â By Claim 42, running each epoch superoperator on the initial state only results in bounded damage, and triggers an update with constant probability. Â By Lemma 15 (additivity of damage), when we run a sequence of epochs, the total damage is at worst multiplied by . Â Moreover, so long as this accumulated damage is smaller than , each epoch still triggers an update with probability at least (because the trace distance between the original state and the state in the registers when we run the epoch is bounded). Â Under these conditions, by Azumaâs inequality, with all but probability, the number of epochs that occur before updates are triggered (and the QPMW algorithm aborts) is at most:
[TABLE]
By Theorem 38 we have that . Â Note that the choice of noise parameter guarantees that the accumulated damage over such rounds is indeed less than (in fact it is less than ; see equation (9)). Â We conclude that in this random process, the probability that each epoch triggers an update stays above for the first epochs.
By Claim 42 and Lemma 15 (additivity of damage), we can bound the total damage by the number of epochs times the damage per epoch, and we get that with all but probability over the coins and measurements made by QPMW:
[TABLE]
Accuracy.  For given gentleness and accuracy parameters , we fix the noise parameter and then analyze the number of copies needed to guarantee accuracy with high probability.  We assume without loss of generality that (if a larger is specified, we simply run the algorithm with ).  We set the parameters so that in an âidealâ run of the algorithm, where each round is run on a fresh copy of the state , the algorithm is -accurate with all but a small constant probability.  We then use the algorithmâs gentleness to show that this implies accuracy in ârealâ runs of the algorithm: namely, we show that in a real run, the algorithm is -accurate with all but a small constant probability.  The error probability can be reduced by independent repetitions.
We begin by setting the parameters so that with high probability, the total damage to the state is bounded by , and recall also that we assume . Â This imposes a constraint on :
[TABLE]
or equivalently:
[TABLE]
Note that this setting also satisfies the conditions of Claim 42.
We also want to guarantee that with high probability, an ideal run of the algorithm would give accurate answers. Â This imposes an upper bound on the noise magnitude . Â We analyze the accuracy by dividing the execution into epochs, as was done in the gentleness analysis above.
Claim 43** (Ideal run accuracy)**
Consider an ideal run of the algorithm (where each round is run on a fresh copy of ) where we set:
[TABLE]
Consider an epoch that can run for at most rounds. The following all hold:
- (1)
With all but probability, there will not be an update in any round of the epoch where . 2. (2)
If in any round of the epoch it is the case that , then an update occurs in that round with all but probability (note this condition can only hold on the round that always ends the epoch). 3. (3)
If the epoch ends in an update round, then the noisy answer is -accurate with all but probability.
Proof. The claim follows immediately from the exponential tails of the Laplace distribution: in each round, for each draw of Laplace noise, with all but probability, the noise magnitude is at most .
Recall that an epoch can end before reaching its last round. Â However, the probability of each epoch reaching its final round is at least (by the definition of the epoch superoperator). Â Thus, if an epoch can run for at most rounds, then the expected number of rounds is at least . Â We conclude that with probability at least , the sum, over all epochs, of the number of rounds for which each epoch can run, is at most (by Markovâs inequality). Â By Claim 43, taking a union bound over all epochs, and taking as set as in Equation 10, we deduce that with all but a small constant probability over the noise choices, the conditions of the online learning theorem for quantum states (Theorem 38) all hold in all rounds simultaneously. Â By that theorem, we conclude that with all but small constant probability over its coins, the QPMW algorithm does not abort, and its answers are all -accurate.
How many copies do we need? Â Before proceeding to prove that a real run of the algorithm is also accurate, we specify the number of copies needed to simultaneously satisfy the constraints in equations (9) and (10) by taking to be large enough. Â We can do so while still guaranteeing the upper bound:
[TABLE]
Note that this setting of , which we use in the proof of Claim 41, also guarantees that is a sufficiently large constant. Â Further, this number of copies guarantees accuracy with all but small constant probability. The error probability can be reduced to by running independent copies of the algorithm, and outputting the median answer in each round.
For simplicity, in the statement of Theorem 39 we claim a slightly more relaxed bound of:
[TABLE]
A hybrid execution.  Consider a hybrid execution, where each epoch superoperator (see above) is run on the ârealâ state (with no substitutions), but after each superoperator completes its operation, we replace the resulting state with a fresh copy of before proceeding to the next epoch superoperator.  Since each epoch is -gentle (Claim 42), we can apply the Damage Lemma (Lemma 17) to conclude accuracy properties for the epoch:
Claim 44** (Hybrid run accuracy)**
Consider a hybrid run of the algorithm (where each epoch is run on a fresh copy of ), with the parameters set as in Equations (9), (10), and (11). Â Let be the bound on the gentleness of each epoch. Â Consider an epoch that can run for at most rounds. The following all hold:
- (1)
With all but probability, there will not be an update in any round of the epoch where . 2. (2)
If in the final round of the epoch it is the case that , then an update occurs in that round with all but probability. 3. (3)
If the epoch ends in an update round, then the noisy answer is -accurate with all but probability.
Proof. Consider the set of rounds where .  By Claim 43, the probability that in an ideal execution an update occurs in one of the rounds in is at most .  Applying Lemma 17 to the epoch superoperator, we conclude that the probability an update occurring in one of the rounds in is at most . We note that in this application of Lemma 17, we restrict to the subset of quantum operations corresponding to rounds in (and condition on the âno updateâ outcome in those rounds).  Claim 43 further bounds the ideal-execution probability of no update if in the last round , and the probability that the update ends in an update round but the noisy answer is not -accurate.  By -gentleness of the epoch superoperator, we conclude that the probabilities of these two events occurring in the hybrid execution are both bounded by .
By Claim 42, the probability that there is no update until the last () round of an epoch is at least . Â Thus, in the hybrid execution, the expected number of rounds for which an -round epoch will run is at least . Â Similarly to the analysis of the ideal execution, taking a union bound over epoch superoperators and taking to be a small constant, we conclude that with all but probability, the conditions of the online learning theorem all hold and the answers returned are all -accurate. Further, by the choice of parameters in Equation (9), we know that with high probability, when we run QPMW and take to be the number of epochs needed to process all measurements, we have . Â We conclude that with all but a small constant probability, a hybrid execution of QPMW does not terminate prematurely, and is -accurate on every measurement.
The real execution. Â Lastly, we consider the real execution, where the epoch superoperators are run in sequence, without any refreshing of the state in the registers. Â We use the gentleness of the epoch superoperator to conclude that the algorithm remains accurate in its real execution.
Claim 45** (Real run accuracy)**
Consider a real run of the algorithm, with the parameters set as in Equations (9), (10), and (11). Â With all but small constant probability over the algorithmâs coins, the following hold in every round of the algorithm (simultaneously):
- (1)
If , then there is no update. 2. (2)
If , then there is an update. 3. (3)
If is an update round, then the noisy answer is -accurate.
Proof. Let be the (âbadâ) event that in some round of QPMW it is either the case that:
- (i)
an update occurs even though , or 2. (ii)
no update occurs even though , or 3. (iii)
is an update round, and the noisy answer is not -accurate.
By the foregoing analysis, the probability of the event in the hybrid execution is bounded by a small constant, say .  We would like to now make a similar argument for a real execution, where the state is not ârefreshedâ between epoch superoperators.
Towards this, let be a bound on the number of epoch superoperators in a run of QPMW, and let be the bound on the gentleness of each epoch superoperator. Â We consider further hybrids, where in the hybrid , the first epochs are each run on fresh copies of , but there is no further refreshing after the epoch. Â Thus the first hybrid equals the real execution, and the hybrid equals the hybrid execution. Â By -gentleness of the epoch superoperator, we have that for every :
[TABLE]
This is simply because the and hybrid differ only in running the epoch: in that epoch is run on the state in the registers after the epoch, whereas in that epoch is run on a fresh copy of . Â By the -gentleness of the epoch, the trace distance between these two states is at most . Â So the two hybrids only differ in the probability that the event occurs in the epoch, and this difference in probabilities is upper-bounded by .
By a hybrid argument, we conclude that the probability of the event occurring in the real execution is at most . Â Further, by the choice of parameters in Equation (9), we know that we can take to be a bound on the number of epochs such that with high probability, epochs suffice to process all measurements, and . Â We conclude that with all but a small constant probability, the real execution of QPMW does not terminate prematurely, and is -accurate on every measurement.
Finally, we reduce the error probability to by running independent executions and outputting the median answer in each round. Â This completes the accuracy proof for QPMW.
Proof of Claim 41. We begin by assuming that the probability of an update is smaller than some sufficiently small constant. Â If this is not the case, then the claim follows immediately from Lemma 32, because runs a -DP classical algorithm. Â Further, we assume throughout that is larger than a sufficiently large constant (see the remark following equation (11)).
We follow similar reasoning to the proof of Lemma 32. Â We begin with a pure product state in the registers
[TABLE]
Let be the state after applying the conditioned superoperator , conditioned on (âno updateâ). Â The superoperator applies the POVM to each , and then runs a classical DP algorithm on the bits observed. Â To implement it, we first apply a unitary transformation (to the state and ancilla qubits). Â This gives a new state:
[TABLE]
Let be the values observed when measuring the registers . Â We draw a noise value from the Laplace distribution with magnitude , and output (no update) whenever:
[TABLE]
Let be the distribution over defined by , where , and note that is indeed a product distribution. Â Let be the distribution conditioned on the event when we run the above (classical) procedure on we get (no update). Â Following the proofs of Lemmas 32 and 31, we can implement the measurement so that:
[TABLE]
At this point we diverge from the proof of Lemma 32.  There, we considered the distribution obtained by conditioning the product distribution on an outcome of a -DP algorithm.  We bounded the KL-divergence between these distributions, and used that to bound the trace distance by .  Here, while we know that the measurement is -DP for , when the probability of an update is much smaller than , we want to argue that observing a âno updateâ answer causes much less damage to the state.
Improving the DP guarantee. The intuition is that when is small, for a âtypicalâ input drawn from , the probability of no update is quite large: .  For an adjacent input , this probability of no update is at least .  For small , the log-ratio between these two probabilities is roughly .  A compelling strategy is to try to bound the KL-divergence using this improved bound, by following a similar argument to the proof of Lemma 32.  For observe that that proof applies even when we focus on any particular output âin this case, âno updateââusing the log-ratio guaranteed for that particular output.
The catch, which significantly complicates the proof, is that not all inputs drawn from are âtypical.â  Some of these inputs have much higher update probabilities than , whereas the proof of Lemma 32 required a worst-case bound that applies to every input in the support of .  On the other hand, by concentration bounds on the Hamming weights of inputs drawn from , the probability of drawing an for which the update probability is significantly higher than is very small.
To obtain an improved bound, we extend the proof of Lemma 32 to this case, using concentration of the (generalized) binomial distribution (a subgaussian distribution), to show that while the contribution of âfarâ inputs to the KL-divergence grows, their probability shrinks more quickly than this growth.  To do this, we partition the inputs into disjoint sets , according to the difference between their Hamming weight and the expected Hamming weight.  We account for the contributions of each set in this partition to the KL-divergence to show the claimed bound.  The details (which can get long and technical) follow.
The event .  For each integer , we define the event to consist of all inputs whose Hamming weights are at least  and less than away from the expectation:
[TABLE]
By Azumaâs inequality, a random input drawn from will with high probability be in for small :
[TABLE]
In particular, for a random input , the expected value of the such that is small:
[TABLE]
Similarly, we can also bound higher moments of this function. Â Since the distribution over the Hamming weight of is subgaussian with standard deviation , we also have:
[TABLE]
We use to denote the distribution conditioned on the event (and similarly for ). Â We proceed with a sequence of technical propositions, which will be used to bound the KL-divergence between and .
Proposition 46
Let be as defined above. Â For every , every , and every , we have:
[TABLE]
Proof. The intuition is that the probability that (by ) is dominated by the probability that this event occurs for inputs whose Hamming weights are close to the expectation. Â By the differential privacy of the Laplace noise mechanism, the log-ratio of probabilities for inputs close to the expectation and inputs in is upper-bounded by in magnitude. Â We show one direction (an upper bound); the lower bound follows similarly:
[TABLE]
Here the second line follows from the differential privacy of the Laplace noise mechanism, as well as the fact that the Hamming distance between inputs and is at most . Â The second-to-last line uses equation (14), while the final line uses equation (13), and can be seen as follows:
[TABLE]
where the last two inequalities hold so long as is a sufficiently large constant.
Proposition 47
Let be as defined above. Â For every , every , and every input that differs from in a single coordinate, we have
[TABLE]
Proof. First, since we add Laplace noise of magnitude before checking for an update, for every pair of adjacent inputs , the log-ratio between the probabilities of is at most . Â When the probability of an update is smaller, we can improve this bound as follows. Â Define to be the probability of an update (i.e., ) given the input . Â By Proposition 46, we have .
Take the count on to be .  An update is triggered when the difference between the noisy count and is too largeâor equivalently, when the noisy count passes a threshold .181818This is without loss of generality: the case can be handled similarly.  The case where cannot occur because then would be much larger than say , whereas we assumed was sufficiently small.  Thus, .  Similarly, the probability of an update on is .  (The case where the count on is smaller than on is handled similarly.)  By the definition of the Laplace distribution, these probabilities are given by:
[TABLE]
Now by standard manipulations we get:
[TABLE]
Here the last line uses the fact that is a sufficiently large constant. Â (Note also that, in the case weâre analyzing, the ratio of probabilities is larger than , so we only need to prove an upper bound.) Â Proposition 47Â follows, recalling that by its conditions .
Proposition 48
Let , and be as defined above. Â Then for every :
[TABLE]
Proof. We employ a variant of the proof of Lemma 30.  We spell out the bound in the first direction, the second direction follows similarly.  Recall that is the product distribution , conditioned on the event (the difference between the Hamming weight of and its expectation is in the interval ).  We can sample an input  from this distribution by sampling from the marginal distribution over the first entry of conditioned on , then drawing from the marginal distribution over the second entry, conditioned on and , and so on up to .   Call the distribution ; note that depends on (and on ).  Similarly, we can also consider a conditional distribution , where we condition both on (no update) and on the event occurring.  We can sample from this second distribution by first drawing  from the marginal distribution over the first entry conditioned on and on , then drawing  from the marginal distribution over the second entry conditioned on no update, on , and on , and so on up to .  Call the distribution in this second process ; note that depends on (as well as on the set and the event ).  The marginal distributions and are over .
We note that for any setting of the first variables, the supports of the random variables and are identical: a given prefix might make the event impossible for a certain fixing of the variable, but in this case the forbidden fixing has weight [math] both in and in .  By Proposition 47 and by Bayesâ rule, for every , for every setting for the first input coordinates, and for every value such that has nonzero probability by , the magnitude of the log-ratio between âs probabilities under  and under  is bounded as follows:
[TABLE]
By Claim 29, this means that the expected log-ratio between  and , with respect to  drawn from , is upper-bounded by
[TABLE]
As in the proof of Lemma 30, we conclude:
[TABLE]
Proposition 49
Let , and be as defined above. Â Partition the line into the following three segments:
[TABLE]
Then the following hold:
- âą
For every integer :
[TABLE]
- âą
For every integer :
[TABLE]
- âą
For every integer :
[TABLE]
Proof. First, by Bayesâ rule, for every we have:
[TABLE]
Further, by Proposition 46 we have that for every integer :
[TABLE]
(The Proposition asserts this for every ; the claim when conditioning on follows by a standard argument.)
Case analysis. Â We proceed to analyze each of the cases separately, beginning with the case . Â Recall that . Â By equation (16), the probability of under can differ from this by at most an multiplicative factor. Â We conclude that:
[TABLE]
Here the second-to-last line holds because for this range of we have , and thus . Â The last line holds because for the same range of we have
[TABLE]
To conclude the analysis of the first case, observe that for similar reasons also in the other direction we have:
[TABLE]
For the second case, , we have and thus:
[TABLE]
Here the last line holds because for all we have . Â In the other direction:
[TABLE]
The third case, , follows immediately from equation (16) (or Proposition 49), which holds for every possible value of .
Bounding the KL-divergence. Â We now proceed to bound the KL-divergence between and
[TABLE]
And similarly:
[TABLE]
Using the nonnegativity of KL-divergence, together with the bound in Proposition 48, we conclude that:
[TABLE]
Below, we show that each of these two sums is bounded by . Â We conclude that
[TABLE]
which completes the proof of Claim 41.
Bounding the first sum. Â We divide the sum over into the three segments defined in Proposition 49, and use the bound on the log-ratio to bound the sum over each of the segments. Â Starting with the first segment :
[TABLE]
Here the first line uses Proposition 49; the second line uses the fact that  for all ; and the last line uses a moment bound on the distribution of , namely inequality (15).
For the segment we have:
[TABLE]
Here the first line uses Proposition 49, the second uses the fact that  for all ; the third uses inequality (13) (concentration of ); and the fourth and fifth use the facts that for all  and that is a sufficiently large constant.
For the segment we have:
[TABLE]
Here the first line uses Proposition 49; the second uses inequality (13) (concentration of ); and the third and fourth use the facts that for all  and that  is a sufficiently large constant.
Bounding the second sum. Â Similarly to the first sum, we divide the sum over into the three segments defined in Proposition 49, and use the bound on the log-ratio to bound the sum over each of the segments. Â Starting with the first segment :
[TABLE]
Here the first inequality uses Proposition 49, while the second uses the fact that  for all .
For the segment we have:
[TABLE]
Here the first line uses Proposition 49; the second uses the fact that  for all ; the third uses equation (13); and the last uses the facts that for all  and that is a sufficiently large constant.
Finally, for the segment we have:
[TABLE]
Here the first line uses Proposition 49; the second uses equation (13); and the third uses the facts that  for all  and that is a sufficiently large constant.
6.3 Lower Bounds for Shadow Tomography
To recap, the QPMW algorithm lets us do shadow tomography on a -dimensional state , with respect to two-outcome measurements and with accuracy , in a way that moreover is online and gentle, by measuring copies of . Â How close to optimal is this upper bound?
The only known general lower bound for shadow tomography, due to Aaronson [6], says that  copies of  are needed, for information-theoretic reasons.  Aaronson [6] also shows that, in the special case where the states and measurements are entirely classical,  copies are necessary and sufficient.191919The original conference version of [6] proved only a weaker lower bound: namely,  when can be arbitrarily large (including for the classical special case).  However, the most recent arXiv version includes the stated bounds, the ones that explicitly incorporate dependence on the dimension .  In the general, quantum setting, it remains open whether there could exist a shadow tomography procedure that used only copies, independent of the dimension .
In this section, we wonât resolve that problem. Â However, as yet another application of our connection between DP and gentleness, weâll observe a lower bound on the sample complexity of gentle shadow tomography, which applies even to offline algorithmsâi.e., ones that see all the measurements in advance. Â And conversely, by using the connection to adaptive data analysis, weâll use known results in that setting to give a lower bound for online shadow tomography, which applies even to non-gentle algorithms.
We stress that, while these lower bounds use nontrivial recent results, they have nothing to do with quantum mechanics: all of them apply even to the âclassical special caseâ of shadow tomography, wherein the input consists of i.i.d. samples from a single distribution and the âmeasurementsâ are all in the computational basis.
Gentle shadow tomography. Â The first result we state is a lower bound for gentle shadow tomography, even in the offline setting:
Theorem 50** (Lower Bound for Gentle Shadow Tomography)**
Any shadow tomography procedure that is -gentle for a constant on all product states, and is also -accurate on states of the form , requires
[TABLE]
samples.
In other words, as long as we insist that our shadow tomography procedure be -gentle for small âwith gentleness applying for all product states, as usual in this paperâthe sample complexity of the QPMW algorithm is optimal up to a polynomial factor.
Weâll deduce Theorem 50Â as a corollary of the following result of Bun, Ullman, and Vadhan [16]:
Theorem 51** (Bun et al. [16])**
For all , there exist Boolean functions , such that no -DP algorithm can, for all databases , estimate  to within additive error , for every and with success probability at least , unless
[TABLE]
The proof of Theorem 51Â uses so-called fingerprinting codes to construct the functions . Â We omit the details; see for example Vadhan [44, Section 5.3]Â for further discussion of this technique.
Recall Lemma 28, which said that any measurement that is -gentle on product states is also -DP on product states.  In the classical special case, the latter simply means -DP in the usual sense.  By just combining this implication with Theorem 51, we immediately obtain a lower bound on the sample complexity of some form of gentle shadow tomography, even in the classical special case.  However, there is still a difficulty.  Namely, the lower bound that we get will apply only to shadow tomography algorithms that remain accurate in what we call the diverse-state setting.  This is the setting where the algorithm is given a sample from a product distribution âor in the quantum case, a product state âand its goal is to estimate the acceptance probability of each of the two-outcome measurements  on the average state
[TABLE]
By contrast, we defined shadow tomography for what we call the identical-state setting: that is, the setting where weâre additionally promised that , so that the input state has the special form . Â All of the shadow tomography procedures that we know, including QPMW, are accurate even in the more general diverse-state setting. Â But itâs not obvious that lower bounds in the diverse-state setting carry over to the identical-state setting, so there is still a gap to close. Â We close the gap using the simple claim below, which translates accuracy in the identical-state setting to accuracy in the diverse-state setting, with only a small loss in the differential privacy parameters.202020We note that it might be possible to obtain a lower bound similar to that of Theorem 51 that directly applies to the identical-state setting (see, e.g., Steinke and Ullman [41, Corollary 15]). Â Still, the transformation we outline incurs only a small loss in the parameters, and works more generally.
Claim 52
Fix a data universe , functions , and a database size . Â Let be a classical algorithm thatâs -DP in the usual classical sense, and satisfies the following accuracy guarantee: for any distribution over , with all but probability over drawn from (and the algorithmâs coins), âs answers are all within of the correct answers .
Then there exists another algorithm , which runs in time using a single oracle call to , such that is -DP, and for any fixed database , with all but probability over the âs coins, âs answers are all within of the correct answers .
Proof. Given an input database , the algorithm operates by taking i.i.d. samples, with replacement, from the distribution that is uniform over the entries of (a distribution whose support has size at most ).  It then runs on the resulting database and outputs the results.
Accuracy follows because we are running on a sample from , so with all but probability over the samples and âs coins, the answers will all be within of the correct expectations over , which are the correct answers on the database .
For privacy, fix adjacent databases and that differ only in the entry.  For fixed coins used to choose i.i.d. samples, let and be the databases produced by sampling from or from respectively.  All entries in and will be identical, except those that are copies of the entry.  By a balls and bins argument, with all but probability, the number of copies of the entry is at most .  Whenever this is the case, the group privacy guarantees that follow from the differential privacy of imply that the probability of any event differs by at most a multiplicative factor and a additive error.
We can now complete the proof of Theorem 50.
Proof of Theorem 50. Let be a shadow tomography procedure that is -gentle on product states , for small and fixed .  By Lemma 28, this is also -DP on product states, for .
Henceforth, we restrict attention to âs behavior on classical inputs . Â Here, being DP on product states simply reduces to the usual notion of DP.
Now suppose further that is -accurate in the identical-state setting. Â Then by Claim 52, we can obtain a new classical procedure that is -DP, and that moreover is -accurate for any given database . Â But this means that must satisfy the bound of Theorem 51. Â We use here the fact that for we get . Â We note that the deterioration in the privacy guarantee of (compared to ) is accounted for by the tilde in the .
As noted above, Theorem 50 applies even to the âclassical special caseâ of shadow tomography.  In that special case, the Chernoff bound immediately implies a procedure with  sample complexity.  Thus, one implication of Theorem 50 is that such a procedure necessarily violates gentlenessâwhere âgentleness,â here, means a bound on the damage in variation distance caused by classical Bayesian updating.
Online shadow tomography. Â The second result we state is a lower bound for online shadow tomography, even without gentleness:
Theorem 53** (Lower Bound for Online Shadow Tomography [35])**
Any online shadow tomography procedure that is -accurate requires sample complexity
[TABLE]
Combining Theorem 53 with the lower bound of Aaronson [6], we can conclude that online shadow tomography requires
[TABLE]
copies of  unless  or .  Hence QPMW achieves the optimal sample complexity for online shadow tomography up to polynomial factors.
Theorem 53 again has nothing to do with quantum mechanics, and follows immediately from known lower bounds for classical adaptive data analysis.  There, an algorithm processes a collection of states that are drawn i.i.d. from an underlying distribution, and the goal is to provide accurate answers with respect to the underlying distributionâand in particular, to avoid overfitting to the specific sample.  Adaptive data analysis is thus a special case of online shadow tomography in the identical-state setting.  Hardt and Ullman [29] and Steinke and Ullman [40] showed sample complexity lower bounds and computational hardness results for this setting.  Theorem 53 is a restatement, in our setting, of a recent result of Nissim et al. [35].
7 Computational Efficiency
So far, our results have been purely information-theoretic.  When we talked, for example, about a gentle âimplementationâ of a measurement , we were concerned only about whether such an implementation existed, not about its time complexity.  Likewise, the QPMW procedure for shadow tomography was efficient in sample complexity, but we werenât concerned to bound its computation time.
Now, at last, we consider to what extent our constructions are (or can be made) computationally efficient. Â In Section 7.1, weâll explain why gentle measurements can be implemented in polynomial time, provided we have an efficient way to uncompute garbage, and weâll give several classes of examples where this can be done. Â Then, in Section 7.2, weâll use our results from Section 7.1Â to examine the computational complexity of the QPMW procedure. Â Finally, in Section 7.3, weâll turn things around, and observe how gentle measurements like the ones in this paper, whether derived from DP algorithms or not, can be applied to the safe implementation of subroutines in quantum algorithms.
7.1 Efficiency of DP and Gentle Measurements
Letâs start with Theorem 5, the connection between gentleness and DP. Â For part (1) of the theorem, namely that -gentleness implies -DP for small , thereâs no issue of computational efficiency. Â This is because the very same measurement procedure that achieves -gentleness also achieves -DPâthe latter being solely a property of the output probabilities, which has nothing to do with the post-measurement states.
On the other hand, for part (2) of the theorem, namely that -DP on product states implies -gentleness on product states for small (and product measurements), there is a computational issue.  Namely: even if our original -DP measurement  could be implemented by a polynomial-size circuit, the proof of Theorem 5 might return an implementation of that is -gentle but that does not correspond to any polynomial-size circuit.  Yet, while this is a problem in principle, fortunately it turns out not to be a problem for any of the measurements that have concerned us in this paper, including the ones used in our shadow tomography procedure.
The potential computational issue occurs in the proof of Lemma 31. Â There, given a classical DP algorithm , we needed to map the state
[TABLE]
to
[TABLE]
where the âs are the possible outcomes of running on the input database . Â Assuming that itself is computationally efficient, itâs easy to prepare a state of the form
[TABLE]
where  is âgarbageâ entangled with the and  registers (for example, the outcomes of coin flips made by ).  The entire difficulty lies in uncomputing the  register.  If we fail to uncompute, then the effect on  might no longer be gentle.
As we mentioned in Section 1.3, an equivalent way to say this is that our reduction from DP to gentleness preserves efficiency if, and only if, we have an efficient algorithm to âQSampleâ the output distribution of the DP algorithm , meaning to prepare the superposition
[TABLE]
for a given input . Â In practice, many fast sampling algorithms do give rise to fast QSampling algorithms, but this need not always be the case. Â Indeed, as pointed out by Aharonov and Ta-Shma [9] in 2003, if fast sampling always implied fast QSampling, then weâd immediately get polynomial-time quantum algorithms for graph isomorphism, breaking lattice-based cryptosystems, and all other problems in the class (Statistical Zero Knowledge). Â Closely related to that, the collision lower bound of Aaronson [1]Â implies that, in the black-box setting, fast sampling does not imply fast QSampling.
But what about the specific measurements considered in this paper? Â Letâs start with the following observation:
Proposition 54** (Efficient Implementation of )**
There is an -size quantum circuit to implement , the Laplace noise measurement on qubits, to accuracy, so long as .
Proof. We simply use the procedure for implementing  described in Section 1.1: the one where, given a superposition over âs, we first prepare a Laplace noise register
[TABLE]
for some cutoff and normalization , then calculate , and finally use  and  together to uncompute the noise .  What makes this work is that, in , the noise is entirely additive, and addition of integers is an easy operation to invert.
Also, as long as , a cutoff of the form  suffices for exponential accuracy.  Moreover, one can check that DP, and hence gentleness, still hold even after we impose the cutoff.
It remains only to verify that there are -size quantum circuits to add and subtract -bit integers, and to prepare . Â The one interesting part is preparing . Â Omitting normalization and restricting to for simplicity, we observe that
[TABLE]
from which a linear-size circuit to prepare follows.
Note that the algorithm from Proposition 54 is âmaximally gentle,â in the sense that for every possible state of the input registers (including non-product states), the only damage that running the algorithm causes to , is the damage that necessarily results from learning the desired output.
We now prove a much more general result, though one thatâs formally incomparable to Proposition 54. Â We start with a trivial-seeming proposition.
Proposition 55
Suppose we have two polynomial-time quantum algorithms: an algorithm that, given a classical string , prepares a state , and an algorithm that, for some , maps to , to  accuracy.  Then thereâs also a polynomial-time quantum algorithm that maps  to , to  accuracy.
Proof. We first run sequentially times, to map  to .  We next run , to map  to (to  accuracy).  Finally we run  sequentially times, to map  to .
Despite its simplicity, Proposition 55 lets us efficiently implement a large class of gentle measurements: namely, any gentle measurement that admits an efficient âtwo-part algorithm,â wherein the first part prepares states (which might include unwanted garbage), and the second part maps the  states to a desired output state  thatâcruciallyâis nearly unentangled with the âs, depending only on the original input .
Letâs give an example.
Theorem 56** (Fast QSampling of Sparse Distributions)**
For each input , suppose the state  has the form
[TABLE]
where the support sets  all satisfy , for some (i.e., the âs are sparse).  Suppose also that thereâs an efficient quantum algorithm  that, for each , samplesâbut does not necessarily QSampleâthe distribution  over conditional on .  Then thereâs also an efficient quantum algorithm that QSamples : that is, maps  to for each (up to  error in trace distance).
Proof. As in Proposition 55, the algorithm  first runs sequentially times, for some sufficiently large .  It thereby produces the state , where
[TABLE]
is a superposition over samples from , possibly entangled with garbage.  Next, simulates a standard-basis measurement on the  registers of the  states, in order to estimate an empirical frequency for each possible output string .  (Of course, all but  strings will have an empirical frequency of [math] in the sample; for the sake of efficiency, the [math]-frequency strings are not explicitly recorded.)  Then, using these empirical frequencies,  prepares the state  to  accuracy.  The efficiency of the preparation procedure follows from the fact that  has support of size .212121Since we only care about  accuracy, in this case we do not even need the Solovay-Kitaev Theorem (see [34]).  Meanwhile, accuracy follows by a Chernoff bound and union bound, together with the assumption that  was a sufficiently large polynomial compared to .  As the final step, uses  to uncompute the âs.
As a small special case of Theorem 56, take and .  Then each  has the form , so the algorithm could be seen as a  decision procedure, which accepts an input  with probability (not necessarily bounded away from ).  We have shown that a probabilistic oracle for this decision procedure can be safely implemented up to  accuracy in polynomial time, for any polynomial .  A reasonable interpretation of this222222That is, for some reasonable definition of what it means to query a  oracle on a superposition of inputs. is that , generalizing the result of Bennett et al. [11] that .
Note that, for some DP algorithms, given an input  we can just explicitly calculate a classical description of the desired output state , to  precision, deterministically and in time polynomial in .  If that description also gives rise to a small quantum circuit to prepare , then we can short-circuit the estimation procedure above, and can improve its accuracy from  to .  As an example, suppose again that each desired output state  is a superposition over a sparse set of basis states,  with . But now suppose that, given , we can calculate both  (as a list of elements), and for each to  precision, in polynomial time.  Then by using the Solovay-Kitaev Theorem (see [34]), we can clearly prepare the states âi.e., QSampleâin polynomial time as well.
It is not clear how to generalize the above techniques to superpositions  over exponentially many basis states (or rather, to do so in any useful generality), even in cases where the individual amplitudes  and probabilities  are computable in polynomial time.
7.2 Efficiency of Shadow Tomography
What does all of this mean for the computational complexity of shadow tomography?  In the QPMW algorithm of Section 6, recall that we needed two types of measurements: threshold measurements on all rounds, and  (Hamming weight plus Laplace noise) type measurements on update rounds.  Proposition 54 has shown that the measurements can be implemented in quantum polynomial time, provided the underlying POVMs  can be implemented in quantum polynomial time.  Since a threshold measurement just consists of an  measurement, followed by a binary threshold decision, followed by uncomputing of garbage, it follows that the threshold measurements can be implemented in quantum polynomial time as well, again assuming efficient procedures for the âs.
Unfortunately, this doesnât mean that QPMW runs in polynomial time overall. Â The first issue is just the sheer number of measurements . Â Since QPMW needs one round per measurement, if is exponentially large then QPMW will of course need exponential time.
The second issue is the need to maintain, and to do computations on, a classical description of the current hypothesis state , in the online learning procedure [7] that QPMW uses as a subroutine.  If  is stored explicitly, as a  Hermitian matrix, then this takes  space, which is prohibitive if is exponentially large.  However, even if  is stored only implicitly, say by a list of constraints that it satisfies, estimating expectation values will still take  time in general.
In summary, if we ignore various low-order contributions, then the running time of QPMW is roughly , where  is an upper bound on the time needed to implement a single measurement .  By comparison, Aaronsonâs previous shadow tomography procedure [6] used roughly  time.  Thus, QPMW improves the dependence on  from quasipolynomial to polynomial.
There is also later work by BrandĂŁo et al. [14], which connects shadow tomography to semidefinite programming and Gibbs states.  BrandĂŁo et al. gave a shadow tomography procedure with the same sample complexity as Aaronsonâs, and running time .  Here the improvement from to  came from, in essence, repeatedly doing Grover search over to find an informative .  Thus, if we compare to BrandĂŁo et al., QPMW matches the improvement from  to , but not the improvement from to .  However, this is to be expected: unlike Aaronsonâs or BrandĂŁo et al.âs, our new shadow tomography procedure is online, which necessitates taking time linear in the number of measurements.
Itâs natural to wonder: is there some inherent barrier ruling out a shadow tomography procedure that runs in  time, avoiding the polynomial dependence on Hilbert space dimension ?  We now show that there is such a barrierâat least if we insist that the shadow tomography procedure be online, or alternatively, that it be gentle.  Our proof will use recent cryptographic lower bounds for differential privacy and for answering adaptively chosen queries, as well as our result that gentleness implies DP.
Hardness for gentle (even offline) shadow tomography.  We use a result of Ullman [43], which shows that under plausible cryptographic assumptions, computing differentially private answers to more than queries (where is the database size) requires time .  This hardness result extends to quantum algorithms, under plausible cryptographic assumptions about their power.  Moreover, the result constructs a single distribution over , such that itâs hard for DP algorithms to compute accurate answers on databases that are drawn i.i.d. from .  Using our result that gentleness implies DP, we derive a similar hardness result for gentle shadow tomography.
Theorem 57** (Ullman [43], quantum variant)**
Suppose there exists a symmetric-key encryption scheme that, for keys of length , is semantically secure against -time quantum adversaries. Â Then there is no quantum algorithm , running in time , that receives as input a database comprised of items from , and a set of queries , such that:
- (1)
* is -DP.* 2. (2)
For any distribution over , if âs entries are drawn i.i.d. from , then with all but a small constant probability over âs coins and the choice of , for every , the answer computed by satisfies:
[TABLE]
Moreover, the queries are each computable in time.
Using the fact that gentleness implies differential privacy (Theorem 5), we conclude that gentle shadow tomography is hard.
Corollary 58
Suppose there exists a symmetric-key encryption scheme that, for keys of length , is semantically secure against -time quantum adversaries.  Then there is no quantum shadow tomography procedure that is gentle on product states and runs in  time.  Moreover, this holds even for the classical special case of shadow tomography.
Corollary 58 applies even to the offline setting, and to algorithms that are accurate only in the identical-state setting where the algorithmâs input is a state of the form . Â Moreover, it applies even for classical data and classical queries. Â We note that Theorem 57 and Corollary 58 extend to milder cryptographic assumptions, with a milder conclusion on the possible running time for gentle shadow tomography. Â Essentially, symmetric key encryption that is hard for time- quantum algorithms translates into hardness of differentially private data analysis for quantum algorithms that run in time , for a fixed constant . Â Similarly to Corollary 58, the existence of such encryption schemes rules out gentle shadow tomography in time .
Finally, we remark that Theorem 57 (and Corollary 58) do not rule out efficient gentle algorithms that are tailored to fixed classes of queriesâeven for exponentially large fixed classes.232323Theorem 57 does not apply because, for the specific queries used to instantiate the lower bound, the time needed to compute the queries grows with the database size. In particular, Theorem 57 does not rule out an efficient DP algorithm for answering all queries that can be computed by -size circuits. More generally, for any fixed query family, it does not rule out the possibility of obtaining an efficient algorithm that is accurate so long as the database is large enough, and in particular larger than the representation of queries in the family.  Until recently, known DP hardness results for fixed query families, such as [23, 13, 33], relied on assumptions for which we have no quantum-secure candidate instantiation, such as bilinear maps or indistinguishability obfuscation.  A recent result of Kowalczyk et al. [32] presents a candidate query family based on the existence of one-way functions.  These results may also extend to gentle shadow tomography.
Hardness for online (even non-gentle) shadow tomography.  We use a result of Steinke and Ullman [40] (building on earlier work by Hardt and Ullman [28]), showing that under plausible cryptographic assumptions, given i.i.d. samples from a distribution over , it is computationally hard to answer more than adaptively-chosen queries accurately.  Under appropriate assumptions, this result extends to quantum algorithms, and shows hardness for time :
Theorem 59** (Steinke and Ullman [40], quantum variant)**
Suppose there exists a symmetric-key encryption scheme that, for keys of length , is semantically secure against -time quantum adversaries.  Then there is no quantum algorithm, running in  time, that takes as input  independent samples from a distribution  over , as well as  efficiently computable counting queries that are chosen adversarially and adaptively, and correctly estimates  to within a fixed constant error for each in an online manner.
Theorem 59 has the following as an immediate corollary.
Corollary 60
Suppose there exists a symmetric-key encryption scheme that, for keys of length , is semantically secure against -time quantum adversaries.  Then there is no shadow tomography procedure that is online and runs in  time.
Note that Corollary 60 applies even to online algorithms that are not gentle, and that work only in the âidentical-state settingâ (i.e., when the algorithmâs input has the form ).  Moreover, it applies even for the classical special case of shadow tomography.  Finally, we note that just like Corollary 58, Corollary 60 extends to milder cryptographic assumptions, albeit with milder conclusions for the complexity of gentle shadow tomography.
7.3 Quantum Complexity Implication
We now observe that gentle measurements, whether or not derived from DP algorithms, have potentially useful applications in quantum algorithms and complexity. Â In particular, whenever we have an efficient implementation of a gentle measurement, we can turn it into a safe and efficient way to run an associated class of estimation subroutines on superpositions of inputs, without generating unwanted garbage.
As an example, letâs now prove Theorem 7 from Section 1.4.  In other words, letâs show that without loss of generality, a machine can coherently query an oracle that takes as input a description of a quantum circuit , and that outputs an estimate of  to within , or a superposition over such estimates, for any desired additive error .  (In the sense that, for every machine that queries such an oracle, there is another machine that simulates the oracle on its own.)  While this might seem obvious, we would not know how to prove it without a gentle measurement procedure of some kind.
Proof of Theorem 7. Let
[TABLE]
be a state of the  machine, where is garbage that we donât care about and  is a description of a quantum circuit whose acceptance probability (say, on the  state) weâd like to estimate.  Then as a first step, we map the above state to
[TABLE]
for some suitable . Â Next we use the efficient implementation of the Laplace noise measurement (with ), from Proposition 54, to map the above to some state
[TABLE]
Here  is an estimate of to within additive errorâor more precisely, a Laplace superposition over estimates, one with the property that
[TABLE]
for all .  The equality (17) is only approximate because in reality, the  register is slightly entangled with the  registers.  However, recall from Corollary 6 that  is -gentle on product states for some .  Thus, the damage to the  registers in trace distance can be upper-bounded by , and the equality (17) also holds up to error .  So as a final step, we can simply uncompute the  registers, to produce a state that is -close in trace distance to
[TABLE]
If we want to ensure that the above, in turn, is -close to a superposition such that
[TABLE]
with certainty, where  is our original accuracy bound, then it suffices to choose  such that for some .  Working backwards, a calculation shows that it suffices to set
[TABLE]
In turn, if our  machine was going to make such queries in sequence, it would suffice to set for each of them, to ensure that the final output has trace distance at most (say)  from what weâd obtain using an ideal oracle for approximating .
Though Theorem 7 is not particularly shocking, it serves as a model for a large number of results that could now be proven, using gentle measurement procedures derived from DP algorithms. Â I.e., for every DP algorithm that can be implemented coherently and in polynomial time, along the lines of Proposition 54, we get another way that quantum algorithms can be safely invoked as subroutines by other quantum algorithms.
One might wonder about the difference between Theorem 7 and our results from Section 7.1.  In particular, why was the Laplace noise measurement  needed for Theorem 7, but not needed for Theorem 56?  The key point is that, in Theorem 7, we wanted outputs that were explicit estimates of .  And even if two estimates  are extremely close, the states  and  will still be orthogonal.  This is what necessitated using a gentle measurement, to break the entanglement between the output and computation registers, and thereby allow safe uncomputing.  In Section 7.1, by contrast, we were content with outputs that were superpositions , with our estimates of probabilities implicitly encoded in âs amplitude vector.  As a result, a slight error in estimating those probabilities would yield a state  such that , and gentle measurement techniques were not needed (even if the results were useful for efficient implementation of gentle measurements).
Here is an interesting question that we leave open.  Suppose a quantum algorithm has a polynomial-time quantum subroutine , which on each input , generates a sample from a probability distribution  supported on a sparse set  with .  Suppose also that the output we want, on each input , is a polynomial-size approximate description of : that is, a string that lists approximations to those  values that are far from zero, or some other representation from which  could be efficiently sampled.  Is there then, necessarily, an efficient way to implement a mapping of the form
[TABLE]
with no garbage?
In the special case where is a classical randomized algorithm, we can do this by first picking a single polynomial-size random string , and then using as âs randomness for every input  in the superposition, relying on amplification and the union bound to ensure that  succeeds on every with overwhelming probability over the choice of .  This is an instance of the well-known âAdlemanâs trickâ [8] from complexity theory, as used for example to prove the containment .  The use of a single avoids any unwanted entanglement between  and the  and registers.
But what about the general case, where is a quantum algorithm? Â Here Adlemanâs trick clearly wonât work, so a different idea is needed: perhaps the use of a more sophisticated DP algorithm than the Laplace algorithm used to prove Theorem 7.
8 Open Problems
This paper established a new bridge between the fields of differential privacy and quantum measurement. Â But weâve barely begun to explore what this bridge can carry. Â Here are a few of our favorite open problems.
Basic Questions
- (1)
Can we generalize our main result, to show that -DP on product states implies -gentleness on product states for any quantum measurement, rather than only for product measurements?  One natural first step would be to prove this for LOCC measurements.  Another would be to show that -triviality on product states implies -gentleness (or even just -gentleness) on product states.  Note that there are two questions here: first, given a measurement thatâs -DP on product states, can we implement (meaning, produce the correct output probabilities on all states, not just product states), in a way that happens to be -gentle when restricted to product states?  And second, can we implement some other measurement that has essentially the same output probabilities as on product states,242424If is -trivial, then to get a nontrivial question here, we demand relative error on product states thatâs less than . and thatâs also -gentle on product states, but that could be arbitrarily different from on entangled states? 2. (2)
In this paper, we used our DP/gentleness connection, together with known results from DP, to design and analyze a new quantum measurement procedure of independent interest (namely, QPMW). Â Can we also go in the opposite direction, and use known results from quantum measurement theory to say anything new about classical differential privacy? 3. (3)
Does -gentleness imply -DP not merely for all , but for all ? 4. (4)
In quantum differential privacy, how much can we do in the âlocal model,â wherein users are each individually responsible for ensuring the privacy of their respective states , by submitting an obscured state  to the database?  Also, how does the local model relate to the model wherein we can only perform measurements on the states separately, for example because of experimental limitations?
Shadow Tomography
- (5)
What is the true sample complexity of shadow tomography?  Recall that this paperâs upper bound had the form , where is the number of measurements and is the Hilbert space dimension.  By contrast, the best known lower bound is [6].  Is any dependence on needed?  Theorem 50 showed that, if a shadow tomography procedure is also gentle on product states, then it needs samples.  Meanwhile, Theorem 53 showed that if the procedure is online, then it needs  samples.  But what if we drop these additional requirements, or relax to gentleness on states of the form ?  We stress that any lower bound will need to be âinherently quantum,â since classically, in the offline and non-gentle setting, an  upper bound holds independent of [6]. 2. (6)
Is it possible to do shadow tomography using incoherent measurements (i.e., measuring each copy of  separately)?  If so, this would bring shadow tomography much closer to experimental feasibility.
Composition
- (7)
What can we say about the composition of quantum DP algorithms (see Appendix 13 for further discussion)?  In the regime where DP implies gentleness, but where the probabilities of outcomes are too small for Lemma 17 to apply, can we compose DP algorithms in a way that preserves not only accuracy, but also a multiplicative privacy guarantee?  Also, outside the regime where DP implies gentleness, is there any way to get around the counterexample of Appendix 13, and compose quantum DP algorithms in a way that preserves accuracy (to say nothing about privacy)?  For example, what about ânon-black-boxâ composition methods? 2. (8)
Does an âadvanced composition theoremâ (see [22]) hold for gentleness, or at least for the particular gentle measurements that arise from our connection between gentleness and DP?  In other words, if we perform -gentle measurements times in sequence, then can we say that with high probability over the measurement outcomes, our states have been damaged by only  in trace distance, rather than ?  If so, we could likely improve the sample complexity of our QPMW shadow tomography procedure, say from  to .
Computational Complexity
- (9)
Is there any example of a polynomial-time classical randomized algorithm that is -DP for some , but does not give rise to a gentle measurement on product states that can be implemented in polynomial time, because of the issue with the computational complexity of QSampling discussed in Section 7?  If so, are there any ânaturalâ examples of such DP algorithms?  It would be of interest to give such examples either conditionally (say, based on a cryptographic assumption), or unconditionally in the black-box model. 2. (10)
Can we show, under some plausible cryptographic assumption, that  computation time is needed for shadow tomography, without the additional constraints that the procedure be online or gentle? 3. (11)
Can we generalize Theorem 7, to give more examples of how quantum algorithms can be safely invoked as subroutines by other quantum algorithms using gentle measurement procedures? Â What about the problem mentioned at the end of Section 7.3?
9 Acknowledgments
We thank Lijie Chen for insightful comments, including catching an error in a previous analysis of QPMW;Â Thomas Steinke, Uri Stemmer, and Jon Ullman for helpful conversations about lower bounds and hardness results for differential privacy and adaptive data analysis; Andris Ambainis, Mark Bun, Dana Moshkovitz, and Fabio Sciarrino for helpful conversations; and David Mestel and the anonymous reviewers for their comments.
10 Appendix: DP, Gentleness, and Triviality on Separable versus
Entangled States
What is the relationship between a measurementâs being differentially private (or trivial, or gentle) on product states, and its having those same properties on arbitrary states?
In this appendix, weâll give examples of measurements on qubits that are
- (1)
-trivial, -DP, and -gentle on all product states (and indeed, on all separable mixed states), and yet 2. (2)
extremely far from being trivial, private, or gentle on certain entangled states.
In some sense, this will answer our question âfor complexity-theoretic purposesâ: doing nothing whatsoever on separable states, to some fixed exponential precision, is compatible with enormous departures from DP, gentleness, and triviality on entangled states.
Nevertheless, weâll then show that thereâs some level of triviality, DP, and gentleness on product states that implies the same properties on arbitrary statesâbut strikingly, that this would be false in quantum mechanics over rather than over .
10.1 Separations
Our first example separates DP on product states from DP on arbitrary states.
Proposition 61
There exists an -qubit measurement thatâs -trivial (and hence, -DP) on product states, but not -DP for any  on arbitrary states.
Proof. For simplicity, let be odd, and group the first qubits into pairs.  Then the measurement will first project each of these pairs onto the Bell pair .  If all projections succeed, then measures the qubit in the basis and returns the result.  Otherwise returns a uniformly random bit.
Clearly, on states of the form
[TABLE]
this measurement is not -DP for any , since (for example) it completely leaks whether  or .
On the other hand, we claim that is -DP on product states.  To see this, observe that every -qubit product state has at most  projection onto the Bell pair .  So when we apply to an -qubit product state, the projections all succeed with probability at most âand if at least one projection fails, then âs output is random.  Thus, if  and  are any two product states, then for all ,
[TABLE]
As a bonus, we can adapt Proposition 61 to separate DP on product states from DP on arbitrary states, even in the special case where the measurement  is mixture-of-products.
Proposition 62
There exists an -qubit mixture-of-products measurement thatâs -trivial (or equivalently, -DP) on product states, but is not -DP for any  on arbitrary states.
Proof. We simply modify the measurement from the proof of Proposition 61, so that now tries to use each of the qubit pairs to violate a Bell inequalityâsay, by playing the so-called CHSH game [17], which can be won with probability  using the entangled state , but with at most  probability using any unentangled state.
If wins at the CHSH game, on (say) at least an fraction of the  qubit pairs, then returns the result of measuring the qubit in the  basis.  Otherwise, returns a uniformly random bit.
Again, on states of the form
[TABLE]
this measurement is not -DP for any , since it leaks whether  or with all but exponentially small probability.
But again, on product states, we claim that is -trivial.  For by a Chernoff bound, whenever is applied to a product state, the  qubit is measured with at most  probability.
The measurements  from Propositions 61 and 62 donât have product form, so we canât apply Theorem 5 to them to conclude automatically that theyâre -gentle on product states.  Nevertheless, itâs not hard to verify directly that they are -gentle on product states, and even on separable mixed states.
By contrast, Corollary 24 says that, if is -gentle on all states, then  is -DP on all states.  But is not -DP on all states, for any  (in the case of Proposition 61) or for any  (in the case of Proposition 62).  So summarizing, we obtain the following corollary of Propositions 61 and 62, which dramatically separates gentleness on product states from gentleness on all states:
Corollary 63
There exists an -qubit measurement thatâs -gentle (and indeed, -trivial) on product states and indeed on separable mixed states, but not -gentle for any  on arbitrary states.  We can even take this measurement to be mixture-of-products.
From Proposition 36, together with Lemma 31, we already get that the measurement  is -gentle on product states despite not being -gentle on arbitrary states.  However, Corollary 63 gives an exponentially more dramatic separation between gentleness on product states and gentleness on arbitrary states.
It will follow from Corollary 68, proved in Section 10.2, that these exponential separations, between triviality, DP, and gentleness on product states and the same parameters on arbitrary states, are the largest separations possible, up to the exact value of the exponential scaling factor.
Note also that the following is an immediate consequence of convexity and of Proposition 13:
Proposition 64
If is -trivial or -DP on all product states, then is also -trivial or -DP respectively on all separable mixed states.
Beware that -gentleness on product states does not automatically imply -gentleness on separable mixed states (even though in the examples above the two happened to go together); the measurement  is a counterexample.
As a final remark, one might wonder whether the counterexamples of Propositions 61Â and 62 and Corollary 63 have classical probabilistic analogues. Â In other words, is there a separation between DP on product distributions, and DP on arbitrary distributions? Â Or the analogous question for triviality? Â We now observe that the answer is no. Â Indeed, this is just a special case of Proposition 64 above. Â Every probability distribution can be written as a convex combination of product distributions (indeed, point distributions), and DP and triviality are both closed under convex combinations.252525Again, gentleness is the outlier, failing to be closed under convex combinations. Â Itâs not hard to show, by a classical analogue of Lemma 23, that the only classical algorithms that are gentle on arbitrary distributions are close to trivial. Â But every algorithm is, or can be made, gentle on classical computational basis states.
Why is the quantum case different?  Because, while DP is closed under convex combinations, itâs not closed under superpositions.  The CHSH game provides one example of this: a certain measurement has a behavior on the Bell pair  thatâs not a convex combination of its behaviors on the components  and âso that the measurement can fail to be DP on the superposition, despite being DP on the components.  Thus, the separation between DP on product states and DP on arbitrary states is a quantum phenomenon.
10.2 Relationships
Weâll now show that, despite the separating examples in the last section, a measurementâs being -trivial on product states for extremely small values of  (say, ), really does imply its being nearly trivial on arbitrary states (and hence DP and gentle as well).  Intriguingly, weâll also show that this depends on the fact that amplitudes in quantum mechanics can be complex rather than only real.
Our first claim is that any measurement  that accepts every product state with the same probability , in fact accepts every state with probability .  We do not know whether this was known before; in any case, we cannot resist including a strikingly simple proof for completeness.  Our proof uses the following result of Braunstein et al. [15]:
Theorem 65** (Braunstein et al. [15])**
In any finite-dimensional tensor product Hilbert space (on any number of registers), the separable mixed states have positive density within the set of all mixed states.
We observe the following consequence.
Theorem 66
Suppose a measurement is [math]-trivial (or equivalently, [math]-DP or [math]-gentle) on all product states. Â Then is [math]-trivial on all states.
Proof. If  is [math]-trivial on product states, then for each possible outcome , there is some constant  such that, for all product states ,
[TABLE]
So by convexity, the above holds as well for all convex combinations of product states: i.e., separable mixed states. Â Now
[TABLE]
for some Hermitian operator .  By Theorem 65, this means that the linear function equals on a subset of positive density.  But any linear function thatâs constant on a subset of positive density is constant everywhere, so  for all .
Why did this depend on amplitudes being complex numbers?  In quantum mechanics over , the result of Braunstein et al. [15] is known to be false.  Let us now show that Theorem 66 is false as well.  Consider the -outcome measurement on ârebitsâ (i.e., real-amplitude qubits) that accepts  with probability , where
[TABLE]
One can check that, for every -rebit pure product state , we have
[TABLE]
and hence the same is true for every -rebit separable mixed state.  Nevertheless, this measurement accepts the entangled rebit state  with certainty, and rejects  with certainty.  This is a rare example of a quantum information phenomenon thatâs fundamentally different for qubits and rebits.262626In the same spirit: in complex quantum mechanics, one can recover the POVM if one knows  for all product states ; but in real quantum mechanics, one canâtâby the same counterexample , which the product states  of rebits fail to distinguish from the POVM that accepts every state with probability .  This fact is a âdualâ to the well-known fact that a mixed state  is uniquely determined by the values of  on all product measurements (i.e., Hardyâs âlocal tomography axiomâ [30] holds), in complex quantum mechanics but not in real quantum mechanics.  The âdualityâ between the two facts can be seen by interchanging the roles of the Hermitian matrices  and  in the expression .
In ordinary (complex) quantum mechanics, we can even obtain a weak quantitative connection between DP, gentleness, and triviality on product states and the same notions on arbitrary states, by using the following result due to Gurvits and Barnum [25].
Theorem 67** ([25])**
Let  be any mixed state on registers, each -dimensional.  Then the state is separable, for all .
Theorem 67Â has the following corollary.
Corollary 68
Suppose the measurement , on registers of dimensions each, is -trivial on product states, for some . Â Then is -trivial on all states.
Proof. Fix some measurement outcome corresponding to the POVM element . Â Then let be the probability that outputs on the maximally mixed state. Â Set , so that . Â Let be an arbitrary state, and let
[TABLE]
Then  is separable by Theorem 67.  So since is -trivial on product states,
[TABLE]
Now,
[TABLE]
Solving for , we find that
[TABLE]
This implies that is -trivial on all states, for
[TABLE]
Here weâve used the fact that .
11 Appendix: General Neighbor Relations
Given two states  on registers each, we called  and  neighbors if itâs possible to reach  from , or  from , by applying some superoperator to a single register only.  In the special case where  and  are both product states, this is simply equivalent to saying that we can reach  from by changing a single .  For correlated or entangled states, by contrast, itâs not obvious that we should favor this definition over various alternatives.
Thus, call  and  superoperator neighbors if theyâre neighbors in the sense above.  Call them unitary neighbors if itâs possible to reach  from , or equivalently  from , by applying some unitary transformation to a single register only.  And call them conditioned neighbors if itâs possible to reach one from the other by applying a conditioned superoperator (i.e., a normalized quantum operation) to a single register.  Clearly, all unitary neighbors are also superoperator neighbors, and all superoperator neighbors are also conditioned neighbors.  But for general states, the three notions are easily seen to form a strict hierarchy.  For example,  and  are superoperator neighbors but not unitary neighbors, while  and  are conditioned neighbors but not superoperator neighbors.
Nevertheless, we now prove that, for the task of defining -DP, switching from superoperator neighbors to unitary neighbors would change nothing of substance, while switching to conditioned neighbors would collapse our framework to triviality.
Proposition 69
If is -DP with respect to unitary neighbors, then is also -DP with respect to superoperator neighbors (regardless of whether we mean DP on product states or on all states).
Proof. Let  and  be superoperator neighbors, which differ only on the  register.  Let be the state obtained by starting from either  or , and then applying a Haar-random unitary transformation to the  register (which has the effect of putting that register into the maximally mixed state, ).  Then averaging over the possible âs and applying convexity, we have
[TABLE]
and likewise
[TABLE]
Hence
[TABLE]
Proposition 70
If is -DP on all states with respect to postselected neighbors, then is -trivial.
Proof. Let  be any two mixed states on registers each.  Let  be the result of measuring the first register of in the  basis and getting the outcome , and let  be the result of measuring the first register of in the  basis and getting the outcome .  Then by assumption, for all possible outcomes of ,
[TABLE]
But the state
[TABLE]
is a postselected neighbor of both  and , since measuring the first register of  in the  basis can yield either.  Hence
[TABLE]
Chaining together the inequalities now yields
[TABLE]
12 Appendix: Differential Privacy Beyond Product and LOCC
Measurements
In this appendix, weâll give an example of a measurement on qubits, which is differentially private on all states, but which is provably not a product measurement, or even a mixture-of-products measurement.  In other words, thereâs no way to implement (even approximately) by measuring each qubit in a separately chosen basis, with none of the bases depending on the outcomes of measuring previous qubits.  This rules out the possibility of a âstructure theoremâ showing that all DP measurements can be put into the restricted form that we mainly studied in the body of this paper.
Going further, weâll also give a second measurement thatâs differentially private on all -qubit states, but which we conjecture is not even LOCC. Â That is, we conjecture that thereâs no way to implement using local operations and classical communication (even allowing adaptivity), and that entangling measurements on the qubits are needed.
To construct , weâll use the following lemma.
Lemma 71
There is no -qubit mixture-of-products measurement that accepts the states and with certainty, and that rejects and  with certainty.
Proof. It suffices to show that thereâs no product measurement; the lemma then follows by convexity.
A product measurement can be written , for some one-qubit POVMs  and .
Suppose we knew that measuring the first qubit in the  basis yielded the outcome .  Then weâd need to accept with certainty if the second qubit was , and reject with certainty if the second qubit was .  But since the âs must be Hermitian and positive semidefinite, the only POVMs  on the second qubit that achieve that objective are equivalent under trivial changes (i.e., relabelings and adding âdummyâ POVM elements) to
[TABLE]
âin other words, simply measuring the second qubit in the  basis.  Likewise, if we knew that measuring the first qubit in the  basis yielded the outcome , then weâd need to accept with certainty if the second qubit was , and reject with certainty if the second qubit was .  The only POVMs that achieve that objective are equivalent under trivial changes to
[TABLE]
âin other words, measuring the second qubit in the  basis.  But since we donât get to choose  based on the outcome of measuring the first qubit, we canât achieve both objectives simultaneously.
Finally, if the first qubit was not measured in the  basis, but in some other basis, then the situation is âeven worse,â since some outcome  of measuring the first qubit will be compatible with the first qubit having been  or with its having been .  So even fixing , weâll again need POVM elements equivalent to  and POVM elements equivalent to , which contradicts .
By compactness considerations, a corollary of Lemma 71 is that there must be some constant  (we have not worked out its value) such that no mixture-of-products measurement can distinguish and  from and  even with success probability .
Using Lemma 71, we now prove the main result.
Theorem 72** (Existence of Non-Product Quantum DP Measurements)**
There exists a measurement on qubits thatâs -DP on all states, but that cannot be approximated (say, to  variation distance in the distribution over measurement outcomes) by any mixture-of-products measurement.
Proof. Set  for some constant .  Then the measurement does the following:
- (1)
Group the qubits into blocks , each of size 2. (2)
Within each block :
- âą
Group the qubits into pairs
- âą
Measure each pair in the basis
- âą
Count the number of these measurements that return either or , and calculate the parity of this number, 3. (3)
Return the sum , across all blocks, plus Laplace noise with average magnitude
Our first claim is that is -DP on all states. Â This is a simple consequence of Proposition 4.
Our second claim is that there exists a probability distribution  over -qubit states (in fact, product states), such that given a state  drawn from , no mixture-of-products measurement can return a nontrivial estimate of .  This  is defined as follows: first set  or , both with equal probability .  Then let  be a tensor product of pairs of qubits of the form , which is chosen uniformly at random among all such tensor products that are consistent with the chosen value of .
To prove the claim, it suffices to show that no mixture-of-products measurement can guess even a single parity , by measuring the block, with bias more than (say) over chance.  For this we appeal to Lemma 71, which says that for each pair of qubits, the measurement cannot perfectly distinguish whether the pair is in the state (which flips the parity ), or the state  (which has no effect on ).  Rather, it can only distinguish these two mixed states only with some constant rate of noise .  So the situation is equivalent to the following: we are trying to guess the parity  of an arbitrary -bit string , but each bit of can be read only noisily, and has either an  probability of appearing as despite being [math] or vice versa (with the errors independent across the bits).  In such a situation, regardless of the value of , it is well-known that our bias in guessing the parity of  falls off like .  By simply setting for some sufficiently large constant , we can make this bias less than .
Finally, we just need to choose (say) .  In that case, the measurement is -DP for , and it returns a nontrivial estimate of .  By contrast, no mixture-of-products measurement returns a nontrivial estimate of , or even distinguishes the case  from the case  with bias (say) .
The proof of Theorem 72 exploited the fact that, even though differential privacy is clearly associated with a lack of âsensitivityâ on the measurementâs part (i.e., changing a single register canât change the output by much), this is still compatible with local subproblems solved by the measurement being exquisitely sensitive to local changes.  Thatâs what happens with the noisy sum of parities example: each parity is maximally sensitive to local changes, even though a noisy sum of them is not.
Now suppose we want to show something stronger: namely, that thereâs an -qubit measurement thatâs differentially private, but that isnât even LOCC (that is, cannot be implemented using separate measurements on each qubit, even with adaptivity). Â We now propose a modification of the measurement from the proof of Theorem 72, which we conjecture has the required property.
Set  for some constant .  Then the measurement does the following.
- (1)
Group the qubits into blocks , each of size 2. (2)
Within each block :
- âą
Group the qubits into sub-blocks , each of size
- âą
Within each sub-block :
- â
Group the qubits into pairs
- â
Perform the swap test on each pair (note: the swap test accepts the product state with probability equal to )
- â
Call âacceptingâ if every swap test accepts, or ârejectingâ if at least one of them rejects
- âą
Count the number of accepting sub-blocks, and let  be the parity of this number 3. (3)
Return the sum , across all blocks, plus Laplace noise with average magnitude
Just like in the proof of Theorem 72, itâs easy to see that  is -DP on all states.  Our conjecture is that  cannot be implemented, even approximately, using LOCC measurements on the qubits.  The intuition is that, if weâre restricted to LOCC, then at best we can simulate each swap test imperfectly: for example, using a measurement that accepts the product state with probability equal to , rather than .  This would imply that we canât reliably distinguish the following two cases:
- (1)
within a given sub-block, every swap test accepts with probability , versus 2. (2)
within that sub-block, swap tests accept with probability (i.e., the two qubits are in orthogonal states), while the remaining swap tests accept with probability .
For the difference between these two cases will get âlost in the Gaussian noise,â which is of order if the constant was sufficiently large.  By contrast, if we take an AND of âtrueâ swap tests, then we accept with probability in case (1), versus with probability in case (2).
But if we canât reliably distinguish these cases using LOCC, then certainly we canât guess the parity, across all sub-blocks within a given block, with bias more than over random.  (Whereas by contrast, using âtrueâ swap tests, we can compute the parity across the sub-blocks with success probability , given the promise that every sub-block satisfies either (1) or (2) above.)
If so, then the end result is that, using LOCC measurements, we canât compute the sum of the parities across the blocks even noisily, whereas using true swap tests, we can.
13 Appendix: On Composition of Quantum DP Algorithms
One of the central properties of classical differential privacy is that it nicely composes: that is, if we run an -DP algorithm followed by an -DP algorithm on the same database, then the resulting algorithm is -DP.  Furthermore, advanced composition [22] shows that, with overwhelming probability, the loss in privacy when we compose algorithms is even slower than linear, growing only like .
This immediately raises a question: does quantum differential privacy similarly compose? Â Here we face a new difficulty, not present in classical case: namely, when we compose quantum DP algorithms, each algorithm will in general damage our state. Â And this might cause not only a catastrophic loss in privacy, but even a catastrophic loss in accuracy.
Fortunately, we can use our connection between DP and gentleness address the concern about accuracy, at least in the regime where that connection applies.  For certainly gentleness composes.  That is, if we apply an -gentle measurement  followed by an -gentle measurement , then the result will be -gentle, by the triangle inequality for trace distance.  And by Corollary 16, this is true even if  is guaranteed to be gentle only on the original state (for example, because itâs a product state), and not necessarily on the post-measurement states that result from applying .  We even conjecture that an âadvanced compositionâ property holds for gentleness (see Section 8).
Thus, suppose we want to compose product measurements  that are each -DP on product states, for some .  Then by Theorem 5, these measurements are each -gentle.  So we can compose them while preserving good accuracy.
Even here, though, thereâs a potential issue with privacy.  The issue arises because the later âs are applied not to our original state , but to damaged versions of the state.  And particularly if this damage is additive rather than multiplicative, we have no guarantee that the later âs will preserve DP (with respect to the original state ) when applied to the damaged versions.  Indeed, ensuring privacy would require saying at least something about the post-measurement states.  If, for example, we implemented some in a way that gratuitously âamplifiedâ the information in (say) the first register, copying it into the other  registers as a byproduct of the measurement procedure, then privacy need not be preserved when we apply .  On the other hand, it seems plausible to us that -DP measurements, for , can always be implemented in such a way that privacy is preserved under composition.
By using Lemma 17, the following proposition confirms that quantum DP composition works at least in the special case where weâre composing a small number of quantum DP algorithms that are gentle, and all of whose outputs have appreciably large probabilities on all states.
Proposition 73** (Limited Composition for Quantum DP)**
Let  be the sequential composition of measurements , where  is -DP on product states and -gentle on product states.  Suppose that for all product states  and all possible sequences  of measurement outcomes, we have
[TABLE]
where . Â Then achieves a relative accuracy of , in the sense that
[TABLE]
for all product states  and all , and in addition is -DP on product states for
[TABLE]
Proof. The relative accuracy part follows immediately from the first part of Lemma 17, which tells us that
[TABLE]
For the -DP part, for all neighboring product states  and all  we have
[TABLE]
In Proposition 73, product states could have been replaced by any other set of states thatâs closed under the neighbor relation. Â For the special case of product states, though, we can combine Proposition 73Â with part (2) of Theorem 5 to get the following corollary, which does not need the gentleness of the underlying measurements as a separate assumption.
Corollary 74
Let  be the sequential composition of product measurements , each on registers.  Suppose that each  is -DP on product states, where is at most .  Suppose also that for all product states , all , and all measurement outcomes , we have
[TABLE]
where is at least .  Then  achieves a relative accuracy of , in the sense that
[TABLE]
for all product states  and outcomes , and in addition is -DP on product states.
In the remainder of this appendix, we will show that, when  is large compared to , so that weâre outside the range where DP implies gentleness, the composition of -DP measurements need not even preserve accuracy.
Recall the ârandomized responseâ algorithm  from Section 5.1, which for each independently, simply measures the qubit in the basis, and returns the measurement outcome with probability , or its complement with probability .  (Thus, the output of  is an -bit string.)  We now give our example:
Theorem 75** (Failure of Composition for Quantum DP)**
There exist -qubit measurements  and  that are individually -DP on product states for , but such that no implementation of  leaves us with a post-measurement state allowing an accurate result to be returned if we later run (even supposing that we donât condition on the outcome of ).
Proof. Let  be the randomized response algorithm , which is -DP by Proposition 33.  Also, let be the variant of the  mechanism from before, but in the  basis.  In other words,  returns the number of âs plus Laplace noise of mean .  Weâve seen that , and hence , is -DP.
Now suppose that in reality, each qubit is either or . Â Then by a straightforward calculation, damages each qubit by in trace distance, even if we average over both possible measurement outcomes, by decreasing the magnitudes of the off-diagonal density matrix entries by . Â In more detail, the effect is simply: every qubit flips to , and every qubit flips to , with independent probability .
So now consider what happens when we run . Â If we have an -bit string , and every bit of gets flipped with independent probability , then from the corrupted string , we can estimate the Hamming weight of the original string to within an additive error of . Â In our case, , so this additive error is .
But recall that needs to estimate the number qubits to within an additive error of . Â If , or equivalently , then this is impossible.
Of course, the above leaves many further questions that one could explore: for example, what happens for in the range between  and ?  Also, what if we restrict our attention to quantum DP algorithms with only a few possible outcomes (thus ruling out randomized response applied to each qubit separately)?  Finally, what if we allow our âcomposedâ algorithm to do anything it likes to obtain the desired information, including violating the specified order (e.g., applying before ) and even more radical changes?
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Aaronson. Quantum lower bound for the collision problem. In Proc. ACM STOC , pages 635â642, 2002. quant-ph/0111102.
- 2[2] S. Aaronson. Limitations of quantum advice and one-way communication. Theory of Computing , 1:1â28, 2005. Earlier version in CCCâ2004. quant-ph/0402095.
- 3[3] S. Aaronson. QMA/qpoly is contained in PSPACE/poly: de-Merlinizing quantum protocols. In Proc. Conference on Computational Complexity , pages 261â273, 2006. quant-ph/0510230.
- 4[4] S. Aaronson. Quantum copy-protection and quantum money. In Proc. Conference on Computational Complexity , pages 229â242, 2009. ar Xiv:1110.5353.
- 5[5] S. Aaronson. The complexity of quantum states and transformations: From quantum money to black holes, February 2016. Lecture Notes for the 28th Mc Gill Invitational Workshop on Computational Complexity, Holetown, Barbados. With guest lectures by A. Bouland and L. Schaeffer. www.scottaaronson.com/barbados-2016.pdf.
- 6[6] S. Aaronson. Shadow tomography of quantum states. In Proc. ACM STOC , pages 325â338, 2018. ar Xiv:1711.01053.
- 7[7] S. Aaronson, X. Chen, E. Hazan, S. Kale, and A. Nayak. Online learning of quantum states. In Proc. of Neural Information Processing Systems (NIPS) , 2018. ar Xiv:1802.09025.
- 8[8] L. Adleman. Two theorems on random polynomial time. In Proc. IEEE FOCS , pages 75â83, 1978.
