Interplay between Quantumness, Randomness, and Selftesting

Xiao Yuan

arXiv:1703.00656·quant-ph·March 3, 2017

Interplay between Quantumness, Randomness, and Selftesting

Xiao Yuan

PDF

Open Access

TL;DR

This paper explores how quantum properties like superposition and entanglement enable tasks such as randomness generation and selftesting, highlighting their interplay and demonstrating quantum advantages through theory and experiments.

Contribution

It introduces new methods for quantifying quantumness via randomness, demonstrates quantum advantage in Bernoulli factory problems, and proposes measurement-independent entanglement witnessing and randomness-independent selftesting schemes.

Findings

01

Quantum coherence can generate true randomness.

02

Quantum advantage demonstrated in Bernoulli factory tasks.

03

Measurement-independent entanglement witness and randomness-independent selftesting proposed.

Abstract

Quantum information processing shows advantages in many tasks, including quantum communication and computation, comparing to its classical counterpart. The essence of quantum processing lies on the fundamental difference between classical and quantum states. For a physical system, the coherent superposition on a computational basis is different from the statistical mixture of states in the same basis. Such coherent superposition endows the possibility of generating true random numbers, realizing parallel computing, and other classically impossible tasks such as quantum Bernoulli factory. Considering a system that consists of multiple parts, the coherent superposition that exists nonlocally on different systems is called entanglement. By properly manipulating entanglement, it is possible to realize computation and simulation tasks that are intractable with classical means. Investigating…

Figures40

Click any figure to enlarge with its caption.

Tables17

Table 1. Table 3.3: A brief summary of trusted-device QRNG demonstrations. Detailed description of these schemes can be found in Section 3.3.1 and 3.3.1 . Note that the quality/security of random numbers in different demonstrations may be different. Raw: reported raw generation rate, Refined: reported refined rate, Acquisition: data acquisition by dedicated hardware or commercial oscilloscope, SPD: single photon detector, BS: beam splitter, MCP-PCID: micro-channel-plate-based photon counting imaging detector, PNRD: photon-number-resolving detector, CMOS: complementary metal-oxide-semiconductor, − - : no related information found.

Year	Entropy source	Detection	Raw	Refined	Acquisition
2000	Spatial mode [93]	SPD	1 Mbps	$-$	dedicated
2000	Spatial mode [94]	SPD	100 Kbps	$-$	dedicated
2014	Spatial mode [95]	MCP-PCID	8 Mbps	$-$	dedicated
2008	Temporal mode [96]	SPD	4.01 Mbps	$-$	dedicated
2009	Temporal mode [97]	SPD	55 Mbps	40 Mbps	dedicated
2011	Temporal mode [98]	SPD	180 Mbps	152 Mbps	dedicated
2014	Temporal mode [99]	SPD	109 Mbps	96 Mbps	dedicated
2010	Photon number [100]	PNRD	50 Mbps	$-$	dedicated
2011	Photon number [101]	PNRD	2.4 Mbps	$-$	dedicated
2015	Photon number [102]	PNRD	$-$	143 Mbps	oscilloscope
2010	Vacuum noise [103]	Homodyne	10 Mbps	6.5 Mbps	dedicated
2010	Vacuum noise [104]	Homodyne	$-$	12 Mbps	dedicated
2011	Vacuum noise [105]	Homodyne	3 Gbps	2 Gbps	dedicated
2010	ASE-intensity noise [106]	Photo detector	12.5 Gbps	$-$	dedicated
2011	ASE-intensity noise [107]	Photo detector	20 Gbps	$-$	$-$
2010	ASE-phase noise [108]	Self-heterodyne	1 Gbps	500 Mbps	oscilloscope
2011	ASE-phase noise [109]	Self-heterodyne	1.2 Gbps	1.11 Gbps	oscilloscope
2012	ASE-phase noise [110]	Self-heterodyne	8 Gbps	6 Gbps	oscilloscope
2014	ASE-phase noise [111]	Self-heterodyne	80 Gbps	$-$	oscilloscope
2014	ASE-phase noise [112]	Self-heterodyne	82 Gbps	43 Gbps	oscilloscope
2015	ASE-phase noise [113]	Self-heterodyne	80 Gbps	68 Gbps	oscilloscope

Table 2. Table 3.4: A summary of self-testing and semi-self-testing QRNG demonstrations. MDI: measurement device independent, SI: source independent, CV: continuous variable.

Year	Type	Detection	Speed	Acquisition
2010	Self-testing [114]	ion-trap	very slow	dedicated
2013	Self-testing [76]	SPD	0.4 bps	dedicated
2015	SI [115]	SPD	5 Kbps	dedicated
2015	CV-SI [116]	Homodyne	1 Gbps	oscilloscope
2015	Self-testing with fixed dimension [117]	SPD	23 bps	dedicated

Table 3. Table 4.1: Comparing the frameworks of coherence and entanglement. DI: device-independent; MDI: measurement-device-independent; QKD: quantum key distribution; QRNG: quantum random number generation.

Properties	Coherence	Entanglement
Classical operation	Inherent operation [14]	LOCC [15]
Classical state	Incoherent state, Eq. (3.1)	Separable state
Distance measure	$C_{rel, ent} (ρ)$ , Eq. (3.3)	Relative entropy distance [16]
Convex roof measure	$R_{Z}^{C} (ρ)$ , Eq. (4.6)	EOF [15, 176, 170]
Distillation	Coherence distillation (Methods)	Entanglement distillation [177, 17]
Formation (cost)	Coherence formation	Entanglement cost [174, 17]
Foundation tests	Further research direction	Nonlocality tests [19, 1]
Interconvertibility	[178, 179]	Deterministic [180], stochastic [181]
Catalysis effect	Further research direction	Entanglement catalysis [182, 183]
Witness	Further research direction	Entanglement witness (EW)
DI applications	Further research direction	DIQKD [55, 184], DIQRNG [64]
MDI applications	Further research direction	MDIQKD [185, 186], MDIEW [29, 30]

Table 4. Table 5.1: Experiment results. θ 𝜃 \theta is the angle of the quoin state; N p subscript 𝑁 𝑝 N_{p} is the total number of p 𝑝 p -quoins prepared. About half of the prepared p 𝑝 p -coins are measured in the X 𝑋 X basis to prepare the q 𝑞 q -coin. q th subscript 𝑞 th q_{\mathrm{th}} is the theoretically estimated value based on the estimation of p 𝑝 p ; q exp subscript 𝑞 exp q_{\mathrm{exp}} is the experimentally estimated value from the obtained q 𝑞 q -coins; f th ( p ) subscript 𝑓 th 𝑝 f_{\mathrm{th}}(p) is the theoretically estimated value from the estimation of p 𝑝 p ; f exp ( p ) subscript 𝑓 exp 𝑝 f_{\mathrm{exp}}(p) is the experimentally estimated value from the obtained f ( p ) 𝑓 𝑝 f(p) -coins; N f ( p ) subscript 𝑁 𝑓 𝑝 N_{f(p)} is the number of f ( p ) 𝑓 𝑝 f(p) -coins obtained.

$θ$	$p$	$N_{p}$	$q_{th}$	$q_{\exp}$	$f_{th} (p)$	$f_{\exp} (p)$	$N_{f (p)}$
$0^{\circ}$	$0.996$	$2.06 * 10^{7}$	$0.563$	$0.504$	$0.016$	$0.014$	$8.66 * 10^{5}$
$15^{\circ}$	$0.979$	$1.87 * 10^{6}$	$0.644$	$0.628$	$0.083$	$0.081$	$8.29 * 10^{5}$
$30^{\circ}$	$0.929$	$2.07 * 10^{7}$	$0.756$	$0.746$	$0.262$	$0.257$	$1.05 * 10^{6}$
$45^{\circ}$	$0.850$	$2.08 * 10^{7}$	$0.857$	$0.847$	$0.509$	$0.495$	$1.25 * 10^{6}$
$60^{\circ}$	$0.748$	$2.08 * 10^{7}$	$0.934$	$0.924$	$0.754$	$0.731$	$1.07 * 10^{6}$
$75^{\circ}$	$0.630$	$1.99 * 10^{6}$	$0.983$	$0.974$	$0.933$	$0.901$	$8.90 * 10^{5}$
$90^{\circ}$	$0.502$	$2.09 * 10^{7}$	$1.000$	$0.990$	$1.000$	$0.965$	$8.85 * 10^{5}$
$105^{\circ}$	$0.375$	$2.18 * 10^{7}$	$0.984$	$0.974$	$0.938$	$0.905$	$9.74 * 10^{5}$
$120^{\circ}$	$0.258$	$2.08 * 10^{7}$	$0.938$	$0.926$	$0.766$	$0.737$	$1.06 * 10^{6}$
$135^{\circ}$	$0.157$	$2.19 * 10^{7}$	$0.864$	$0.849$	$0.530$	$0.508$	$1.32 * 10^{6}$
$150^{\circ}$	$0.080$	$2.09 * 10^{7}$	$0.772$	$0.749$	$0.296$	$0.283$	$1.08 * 10^{6}$
$165^{\circ}$	$0.033$	$2.08 * 10^{7}$	$0.678$	$0.633$	$0.126$	$0.120$	$9.45 * 10^{5}$

Table 5. Table 6.1: Decomposition of W 𝑊 W based on different measurement outcomes.

$M_{A A}$	$M_{B B}$	$W$
$\| Φ_{A A}^{+} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ + \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$\| Φ_{B B}^{+} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ + \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$W = \sum_{s, t} β_{s, t}^{+ +} τ_{s}^{T} \otimes ω_{t}^{T}$
$\| Φ_{A A}^{-} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ - \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$\| Φ_{B B}^{-} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ - \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$W = \sum_{s, t} β_{s, t}^{- -} {\tilde{τ}}_{s}^{T} \otimes {\tilde{ω}}_{t}^{T}$
$\| Φ_{A A}^{+} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ + \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$\| Φ_{B B}^{-} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ - \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$W = \sum_{s, t} β_{s, t}^{+ -} τ_{s}^{T} \otimes {\tilde{ω}}_{t}^{T}$
$\| Φ_{A A}^{-} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ - \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$\| Φ_{B B}^{+} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ + \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$W = \sum_{s, t} β_{s, t}^{- +} {\tilde{τ}}_{s}^{T} \otimes ω_{t}^{T}$

Table 6. Table 6.2: Coefficients and probabilities for MDIEW with outcomes + ⁣ + ++ and − ⁣ − -- . Note that when β = 0 𝛽 0 \beta=0 , the corresponding probability p 𝑝 p is irrelevant.

	$τ_{0} = I / 2$	$τ_{1} = \frac{I + σ_{x}}{2}$	$τ_{2} = \frac{I + σ_{y}}{2}$	$τ_{3} = \frac{I + σ_{z}}{2}$	$τ_{4} = \frac{I + (σ_{x} + σ_{y} + σ_{z}) / \sqrt{3}}{2}$
$ω_{0} = I / 2$	$β = 2 \sqrt{3} - 2, p = \frac{1}{16}$	$β = 0$	$β = 0$	$β = 0$	$β = - \sqrt{3}, p = \frac{1}{16}$
$ω_{1} = \frac{I + σ_{x}}{2}$	$β = 0$	$β = 1, p = \frac{v}{16}$	$β = 0$	$β = 0$	$β = 0$
$ω_{2} = \frac{I + σ_{y}}{2}$	$β = 0$	$β = 0$	$β = 1, p = \frac{v}{16}$	$β = 0$	$β = 0$
$ω_{3} = \frac{I + σ_{z}}{2}$	$β = 0$	$β = 0$	$β = 0$	$β = 1, p = \frac{v}{8}$	$β = 0$
$ω_{4} = \frac{I + (σ_{x} + σ_{y} + σ_{z}) / \sqrt{3}}{2}$	$β = - \sqrt{3}, p = \frac{1}{16}$	$β = 0$	$β = 0$	$β = 0$	$β = 0$

Table 7. Table 6.3: Coefficients and probabilities for MDIEW with outcomes + ⁣ − +- and − ⁣ + -+ . Note that when β = 0 𝛽 0 \beta=0 , the corresponding probability p 𝑝 p is irrelevant.

	$τ_{0} = I / 2$	$τ_{1} = \frac{I + σ_{x}}{2}$	$τ_{2} = \frac{I + σ_{y}}{2}$	$τ_{3} = \frac{I + σ_{z}}{2}$	$τ_{4}^{'} = \frac{I + (- σ_{x} - σ_{y} + σ_{z}) / \sqrt{3}}{2}$
$ω_{0} = I / 2$	$β = 2 \sqrt{3} + 2, p = \frac{1}{16}$	$β = 0$	$β = 0$	$β = 0$	$β = - \sqrt{3}, p = \frac{1}{16}$
$ω_{1} = \frac{I + σ_{x}}{2}$	$β = 0$	$β = - 1, p = \frac{2 - v}{16}$	$β = 0$	$β = 0$	$β = 0$
$ω_{2} = \frac{I + σ_{y}}{2}$	$β = 0$	$β = 0$	$β = - 1, p = \frac{2 - v}{16}$	$β = 0$	$β = 0$
$ω_{3} = \frac{I + σ_{z}}{2}$	$β = 0$	$β = 0$	$β = 0$	$β = 1, p = \frac{v}{8}$	$β = 0$
$ω_{4}^{'} = \frac{I + (- σ_{x} - σ_{y} + σ_{z}) / \sqrt{3}}{2}$	$β = - \sqrt{3}, p = \frac{1}{16}$	$β = 0$	$β = 0$	$β = 0$	$β = 0$

Table 8. Table 6.4: Our MDIEW in the form of Eq. ( 6.21 ) for the bipartite states defined in Eq. ( 6.8 ).

$τ_{s}$	$ω_{t}$	$β_{s t}^{+ +}$	$p (+, + \| τ_{s}, ω_{t})$	$β_{s t}^{+ -}$	$p (+, - \| τ_{s}, ω_{t})$
$I / 2$	$I / 2$	$2 \sqrt{3} - 2$	$1 / 16$	$2 \sqrt{3} + 2$	$1 / 16$
$\frac{I + σ_{x}}{2}$	$\frac{I + σ_{x}}{2}$	$1$	$v / 16$	$- 1$	$(2 - v) / 16$
$\frac{I + σ_{y}}{2}$	$\frac{I + σ_{y}}{2}$	$1$	$v / 16$	$- 1$	$(2 - v) / 16$
$\frac{I + σ_{z}}{2}$	$\frac{I + σ_{z}}{2}$	$1$	$v / 8$	$1$	$v / 8$
$I / 2$	$\frac{I + (σ_{x} + σ_{y} + σ_{z}) / \sqrt{3}}{2}$	$- \sqrt{3}$	$1 / 16$	0	-
$\frac{I + (σ_{x} + σ_{y} + σ_{z}) / \sqrt{3}}{2}$	$I / 2$	$- \sqrt{3}$	$1 / 16$	0	-
$I / 2$	$\frac{I + (- σ_{x} - σ_{y} + σ_{z}) / \sqrt{3}}{2}$	0	-	$- \sqrt{3}$	$1 / 16$
$\frac{I + (- σ_{x} - σ_{y} + σ_{z}) / \sqrt{3}}{2}$	$I / 2$	0	-	$- \sqrt{3}$	$1 / 16$

Table 9. Table 6.5: Error estimation of density matrix, real non-zero parts

$θ$	$v_{t h e o r y}$	$v_{ρ_{11}}$	$v_{ρ_{22}}$	$v_{ρ_{33}}$	$v_{ρ_{44}}$	$v_{ρ_{23}}$	$\bar{v_{e x p}}$	$δ \bar{v_{e x p}}$	$δ v_{e x p}$
		$v_{e x p e r i m e n t}$
45	0	0.0196	0.0228	0.0064	0.0258	0.0290	0.0207	0.0039	0.0087
30	0.25	0.2580	0.2538	0.2426	0.2686	0.2644	0.2575	0.0045	0.0101
22.5	0.5	0.4944	0.4820	0.4824	0.5230	0.5108	0.4985	0.0081	0.0180
15	0.75	0.7298	0.7198	0.7280	0.7718	0.7620	0.7423	0.0103	0.0231
0	1	0.9680	0.9818	0.9222	0.9684	0.9822	0.9645	0.0110	0.0246

Table 10. Table 6.6: The tangle values of the input states by tomography.

$θ_{e x p}$	$v_{t h e o r y}$	$v_{e x p}$	$v_{e r r o r}$	tangle( $ρ_{34}^{v} (θ)$ )	${tangle}_{error}$
$45^{\circ}$	0	0.021	0.009	0.840	0.001
$30^{\circ}$	0.25	0.257	0.010	0.233	0.001
${22.5}^{\circ}$	0.5	0.499	0.018	0.000	0
$15^{\circ}$	0.75	0.742	0.023	0.000	0
$0^{\circ}$	1	0.965	0.025	0.000	0

Table 11. Table 8.1: The lower bound for randomness parameter P 𝑃 P defined in Eq ( 9.2 ) that allows the CHSH value S 𝑆 S , defined in Eq. ( 9.5 ), to reach the quantum bound S Q subscript 𝑆 𝑄 S_{Q} by LHVMs in the CHSH test under different conditions.

	Correlated inputs	Uncorrelated inputs
Single Run	0.285 [35, 77]	0.354 [77]
Multiple Run	0.258 [78]	$\leq 0.264$ (Our Work)

Table 12. Table 9.1: The value of J λ subscript 𝐽 𝜆 J_{\lambda} with deterministic strategy.

		$({\tilde{p}}_{B} (0), {\tilde{p}}_{B} (1))$
		$(0, 0)$	$(0, 1)$	$(1, 0)$	$(1, 1)$
$({\tilde{p}}_{A} (0), {\tilde{p}}_{A} (1))$	$(0, 0)$	$0$	$0$	$- (p (0, 0) + p (1, 0)) / 2$	$- (p (0, 0) + p (1, 0)) / 2$
	$(0, 1)$	$0$	$- p (1, 1)$	$(p (1, 0) - p (0, 0)) / 2$	$(p (1, 0) - p (0, 0)) / 2 - p (1, 1)$
	$(1, 0)$	$- (p (0, 0) + p (0, 1)) / 2$	$(p (0, 1) - p (0, 0)) / 2$	$- (p (0, 1) + p (1, 0)) / 2$	$(p (0, 1) - p (1, 0)) / 2$
	$(1, 1)$	$- (p (0, 0) + p (0, 1)) / 2$	$(p (0, 1) - p (0, 0)) / 2 - p (1, 1)$	$(p (1, 0) - p (0, 1)) / 2$	$(p (1, 0) + p (0, 1)) / 2 - p (1, 1)$

Table 13. Table 9.2: Possible strategies for letting J λ subscript 𝐽 𝜆 J_{\lambda} be positive.

$({\tilde{p}}_{A} (0), {\tilde{p}}_{A} (1), {\tilde{p}}_{B} (0), {\tilde{p}}_{B} (1))$	$J_{λ}$
(0,1,1,0)	$(p (1, 0) - p (0, 0)) / 2$
(0,1,1,1)	$(p (1, 0) - p (0, 0)) / 2 - p (1, 1)$
(1,0,0,1)	$(p (0, 1) - p (0, 0)) / 2$
(1,0,1,1)	$(p (0, 1) - p (1, 0)) / 2$
(1,1,0,1)	$(p (0, 1) - p (0, 0)) / 2 - p (1, 1)$
(1,1,1,0)	$(p (1, 0) - p (0, 1)) / 2$
(1,1,1,1)	$(p (1, 0) + p (0, 1)) / 2 - p (1, 1)$

Table 14. Table 9.3: The coefficient β i j subscript 𝛽 𝑖 𝑗 \beta_{ij} of q ( λ j ) p i ( λ j ) 𝑞 subscript 𝜆 𝑗 subscript 𝑝 𝑖 subscript 𝜆 𝑗 q(\lambda_{j})p_{i}(\lambda_{j}) in the expression of J CH LHVM subscript superscript 𝐽 LHVM CH J^{\mathrm{LHVM}}_{\mathrm{CH}} of the CH inequality.

	$λ_{1}$	$λ_{2}$	$λ_{3}$	$λ_{4}$	$λ_{5}$
$p_{0}$	$- \frac{1}{2}$	$\frac{1}{2}$	$0$	$0$	$0$
$p_{1}$	$0$	$\frac{1}{2}$	$\frac{1}{2}$	$- \frac{1}{2}$	$\frac{1}{2}$
$p_{2}$	$\frac{1}{2}$	$0$	$- \frac{1}{2}$	$\frac{1}{2}$	$\frac{1}{2}$
$p_{3}$	$0$	$0$	$0$	$0$	$- 1$

Table 15. Table 9.4: The coefficient of p i ( λ j ) subscript 𝑝 𝑖 subscript 𝜆 𝑗 p_{i}(\lambda_{j}) in the expression of J CHSH LHVM subscript superscript 𝐽 LHVM CHSH J^{\mathrm{LHVM}}_{\mathrm{CHSH}} of the CHSH inequality.

	$λ_{1}$	$λ_{2}$	$λ_{3}$	$λ_{4}$
$p_{0}$	$1$	$1$	$1$	$- 1$
$p_{1}$	$1$	$1$	$- 1$	$1$
$p_{2}$	$1$	$- 1$	$1$	$1$
$p_{3}$	$- 1$	$1$	$1$	$1$

Table 16. Table C.1: The coefficient of p i ( λ j ) subscript 𝑝 𝑖 subscript 𝜆 𝑗 p_{i}(\lambda_{j}) in the expression of J.

	$q (λ_{1})$	$q (λ_{2})$	$q (λ_{3})$	$q (λ_{4})$	$q (λ_{5})$
$p_{0}$	$0$	$0$	$\frac{1}{2}$	$\frac{1}{2}$	$\frac{1}{2}$
$p_{1}$	$\frac{1}{2}$	$1$	$1$	$0$	$1$
$p_{2}$	$1$	$\frac{1}{2}$	$0$	$1$	$1$
$p_{3}$	$1$	$1$	$1$	$1$	$0$

Table 17. Table C.2: The coefficient of p i ( λ j ) subscript 𝑝 𝑖 subscript 𝜆 𝑗 p_{i}(\lambda_{j}) in the expression of J.

	$q (λ_{1})$	$q (λ_{2})$	$q (λ_{3})$	$q (λ_{4})$	$q (λ_{5})$
$p_{0}$	$- 1$	$- 1$	$- \frac{1}{2}$	$- \frac{1}{2}$	$- \frac{1}{2}$
$p_{1}$	$- \frac{1}{2}$	$0$	$0$	$- 1$	$0$
$p_{2}$	$0$	$- \frac{1}{2}$	$- 1$	$0$	$0$
$p_{3}$	$0$	$0$	$0$	$0$	$- 1$

Equations899

p (a, b ∣ x, y)_{c} = λ \sum p (λ) p (a ∣ x, λ) p (b ∣ y, λ),

p (a, b ∣ x, y)_{c} = λ \sum p (λ) p (a ∣ x, λ) p (b ∣ y, λ),

∣ ψ ⟩ = i = 1 \sum d α_{i} ∣ i ⟩,

∣ ψ ⟩ = i = 1 \sum d α_{i} ∣ i ⟩,

{\psi}=\left(\begin{array}[]{c}\alpha_{1}\\ \vdots\\ \alpha_{d}\\ \end{array}\right).

{\psi}=\left(\begin{array}[]{c}\alpha_{1}\\ \vdots\\ \alpha_{d}\\ \end{array}\right).

⟨ M ∣ = j = 1 \sum d β_{j} ⟨ j ∣ .

⟨ M ∣ = j = 1 \sum d β_{j} ⟨ j ∣ .

p = ∣ ⟨ M ∣ ψ ⟩ ∣^{2} = j = 1 \sum d β_{j} ⟨ j ∣ i = 1 \sum d α_{i} ∣ i ⟩^{2} = i, j = 1 \sum d α_{i} β_{j} ⟨ j ∣ i ⟩^{2} .

p = ∣ ⟨ M ∣ ψ ⟩ ∣^{2} = j = 1 \sum d β_{j} ⟨ j ∣ i = 1 \sum d α_{i} ∣ i ⟩^{2} = i, j = 1 \sum d α_{i} β_{j} ⟨ j ∣ i ⟩^{2} .

p = i \sum d α_{i} β_{i}^{2} .

p = i \sum d α_{i} β_{i}^{2} .

M = (β_{1}, \dots, β_{d}),

M = (β_{1}, \dots, β_{d}),

p=\left|(\beta_{1},\dots,\beta_{d})\left(\begin{array}[]{c}\alpha_{1}\\ \vdots\\ \alpha_{d}\\ \end{array}\right)\right|^{2}=\left|\sum_{i}^{d}\alpha_{i}\beta_{i}\right|^{2}.

p=\left|(\beta_{1},\dots,\beta_{d})\left(\begin{array}[]{c}\alpha_{1}\\ \vdots\\ \alpha_{d}\\ \end{array}\right)\right|^{2}=\left|\sum_{i}^{d}\alpha_{i}\beta_{i}\right|^{2}.

\overset{ˉ}{O} = ⟨ ψ ∣ O ∣ ψ ⟩ .

\overset{ˉ}{O} = ⟨ ψ ∣ O ∣ ψ ⟩ .

O = i \sum λ_{i} ∣ o_{i} ⟩ ⟨ o_{i} ∣,

O = i \sum λ_{i} ∣ o_{i} ⟩ ⟨ o_{i} ∣,

O = i \sum λ_{i} p_{i} .

O = i \sum λ_{i} p_{i} .

i ℏ \frac{\partial ∣ ψ ( t ) ⟩}{\partial t} = H ∣ ψ (t) ⟩ .

i ℏ \frac{\partial ∣ ψ ( t ) ⟩}{\partial t} = H ∣ ψ (t) ⟩ .

∣ ψ (t) ⟩ = U (t, t_{0}) ∣ ψ (t_{0}) ⟩,

∣ ψ (t) ⟩ = U (t, t_{0}) ∣ ψ (t_{0}) ⟩,

H = i, j \sum ⟨ i ∣ H ∣ j ⟩ ∣ i ⟩ ⟨ j ∣,

H = i, j \sum ⟨ i ∣ H ∣ j ⟩ ∣ i ⟩ ⟨ j ∣,

U (t, t_{0}) * U (t_{0}, t) = I,

U (t, t_{0}) * U (t_{0}, t) = I,

∣ ψ ⟩_{A B} = i_{A}, i_{B} \sum α_{i_{A} i_{B}} ∣ i_{A} ⟩_{A} ∣ i_{B} ⟩_{B},

∣ ψ ⟩_{A B} = i_{A}, i_{B} \sum α_{i_{A} i_{B}} ∣ i_{A} ⟩_{A} ∣ i_{B} ⟩_{B},

\ket{\psi}_{AB}=\left(\begin{array}[]{c}\alpha_{11}\\ \alpha_{12}\\ \vdots\\ \alpha_{1d_{B}}\\ \vdots\\ \alpha_{d_{A}1}\\ \alpha_{d_{A}2}\\ \vdots\\ \alpha_{d_{A}d_{B}}\\ \end{array}\right).

\ket{\psi}_{AB}=\left(\begin{array}[]{c}\alpha_{11}\\ \alpha_{12}\\ \vdots\\ \alpha_{1d_{B}}\\ \vdots\\ \alpha_{d_{A}1}\\ \alpha_{d_{A}2}\\ \vdots\\ \alpha_{d_{A}d_{B}}\\ \end{array}\right).

p = Tr [ρP] = ∣ ⟨ M ∣ ψ ⟩ ∣^{2},

p = Tr [ρP] = ∣ ⟨ M ∣ ψ ⟩ ∣^{2},

p (M_{A}, i_{B}) = Tr [(∣ ψ ⟩_{A B} ⟨ ψ ∣_{A B}) (∣ M ⟩_{A} ⟨ M ∣_{A} \otimes ∣ i_{B} ⟩_{B} ⟨ i_{B} ∣_{B})],

p (M_{A}, i_{B}) = Tr [(∣ ψ ⟩_{A B} ⟨ ψ ∣_{A B}) (∣ M ⟩_{A} ⟨ M ∣_{A} \otimes ∣ i_{B} ⟩_{B} ⟨ i_{B} ∣_{B})],

p (M_{A})

p (M_{A})

= i_{B} \sum Tr [(∣ ψ ⟩_{A B} ⟨ ψ ∣_{A B}) (∣ M ⟩_{A} ⟨ M ∣_{A} \otimes ∣ i_{B} ⟩_{B} ⟨ i_{B} ∣_{B})]

= Tr [(∣ ψ ⟩_{A B} ⟨ ψ ∣_{A B}) (∣ M ⟩_{A} ⟨ M ∣_{A} \otimes i_{B} \sum ∣ i_{B} ⟩_{B} ⟨ i_{B} ∣_{B})]

= Tr [ρ_{A} P_{A}],

ρ_{A} = Tr_{B} [(∣ ψ ⟩_{A B} ⟨ ψ ∣_{A B})],

ρ_{A} = Tr_{B} [(∣ ψ ⟩_{A B} ⟨ ψ ∣_{A B})],

\sigma_{x}=\left(\begin{array}[]{cc}0&1\\ 1&0\\ \end{array}\right),\sigma_{y}=\left(\begin{array}[]{cc}0&-i\\ i&0\\ \end{array}\right),\sigma_{z}=\left(\begin{array}[]{cc}1&0\\ 0&-1\\ \end{array}\right),

\sigma_{x}=\left(\begin{array}[]{cc}0&1\\ 1&0\\ \end{array}\right),\sigma_{y}=\left(\begin{array}[]{cc}0&-i\\ i&0\\ \end{array}\right),\sigma_{z}=\left(\begin{array}[]{cc}1&0\\ 0&-1\\ \end{array}\right),

⟨ e_{i}, e_{j} ⟩ = Tr [e_{i}^{†} e_{j}] = 2 δ_{i, j}, \forall i, j \in {1, 2, 3, 4} .

⟨ e_{i}, e_{j} ⟩ = Tr [e_{i}^{†} e_{j}] = 2 δ_{i, j}, \forall i, j \in {1, 2, 3, 4} .

ρ = \frac{I + n _{x} σ _{x} + n _{y} σ _{y} + n _{z} σ _{z}}{2} .

ρ = \frac{I + n _{x} σ _{x} + n _{y} σ _{y} + n _{z} σ _{z}}{2} .

n_{i} = Tr [ρ σ_{i}], \forall i = x, y, z

n_{i} = Tr [ρ σ_{i}], \forall i = x, y, z

ρ_{A B} = i, j \sum λ_{i, j} e_{i} \otimes e_{j} .

ρ_{A B} = i, j \sum λ_{i, j} e_{i} \otimes e_{j} .

λ_{i, j} = Tr [ρ_{A B} (e_{i} \otimes e_{j})] /4.

λ_{i, j} = Tr [ρ_{A B} (e_{i} \otimes e_{j})] /4.

∣ ψ ⟩_{A B} = i \sum p_{i} ∣ ψ_{i} ⟩ ∣ i ⟩,

∣ ψ ⟩_{A B} = i \sum p_{i} ∣ ψ_{i} ⟩ ∣ i ⟩,

∣ ψ ⟩_{A B} = i \sum p_{i} ∣ i ⟩_{A} \otimes ∣ i^{'} ⟩_{B} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum Information and Cryptography · Quantum Mechanics and Applications · Quantum Computing Algorithms and Architecture

Full text

\department

Institute for Interdisciplinary Information Sciences

\major

Physics

\degree

Doctor of Philosophy

\degreemonth

December \degreeyear2016

\thesisdate

December 20, 2016

\supervisor

Xiongfeng MaAssistant Professor

Interplay between Quantumness, Randomness, and Selftesting

Xiao Yuan

{abstractpage}

Quantum information processing shows advantages in many tasks, including quantum communication and computation, comparing to its classical counterpart. The essence of quantum processing lies on the fundamental difference between classical and quantum states. For a physical system, the coherent superposition on a computational basis is different from the statistical mixture of states in the same basis. Such coherent superposition endows the possibility of generating true random numbers, realizing parallel computing, and other classically impossible tasks such as quantum Bernoulli factory. Considering a system that consists of multiple parts, the coherent superposition that exists nonlocally on different systems is called entanglement. By properly manipulating entanglement, it is possible to realize computation and simulation tasks that are intractable with classical means.

Investigating quantumness, coherent superposition, and entanglement can shed light on the original of quantum advantages and lead to the design of new quantum protocols. This thesis mainly focuses on the interplay between quantumness and two information tasks, randomness generation and selftesting quantum information processing. We discuss how quantumness can be used to generate randomness and show that randomness can in turn be used to quantify quantumness. In addition, we introduce the Bernoulli factory problem and present the quantum advantage with only coherence in both theory and experiment. Furthermore, we show a method to witness entanglement that is independent of the realization of the measurement. We also investigate randomness requirements in selftesting tasks and propose a random number generation scheme that is independent of the randomness source.

By investigating the interplay between quantumness and the two information tasks, we have investigated the essence of quantumness and its fundamental role in quantum information processing. Apart from the theoretical significance, the results can be experimentally tested and applied in practice.

Key words: Coherence; entanglement; randomness; selftesting; quantum cryptography

Acknowledgements

The research presented in this Doctor of Philosophy thesis is carried out under the the supervision of Professor Xiongfeng Ma at the Institute for Interdisciplinary Information Sciences at Tsinghua University, China. Being a wonderful mentor in research and an intimate friend in life, it is Xiongfeng who, for the first time, makes me feel the pleasure of doing research. I acknowledge him for the inspiring instruction, discussion, and encouragement and for sharing his extensive knowledge. In the mean time, I would like to thank Professor Giulio Chiribella for his guidance during the first year of my Ph.D. study. With his altruistic help, I learned the basics of quantum information and completed my first few researches. In addition, I acknowledge Professor Mile Gu for sharing his insightful thoughts in quantum correlation and relativistic quantum information. Under his guidance, I broadened my knowledge of quantum science and completed an interesting work of quantum information in the presence of time-traveling.

During my Ph.D. study, I am lucky to have many chances to visit several wonderful academic institutes. My special thanks go to the hosts for their kindly help and valuable discussions. In chronological order of the visiting places, the hosts include Professor Yu-Ao Chen and Jian-Wei Pan at the University of Science and Technology of China, Professor Hoi Fung Chau at the University of Hong Kong, Professor Peter Zoller at Austria University of Innsbruck, Professor Anton Zeilinger at the University of Vienna, Professor Renato Renner at ETH, Professor Nicolas Gisin at the Université de Genève, Professor Romain Alléaume at Telecom ParisTech, Professor Qiang Zhang at the University of Science and Technology of China, Dr. Graeme Smith, Dr. John Smolin and Dr. Charles H. Bennett at IBM’s Thomas J. Watson Research Center, Dr. Qingyu Cai at the Chinese Academy of Sciences Wuhan Physics and Mathematics Institute, and Professor Yeong-Cherng Liang at the National Cheng Kung University, Tainan. Especially, I would like to thank Dr. Charles H. Bennett for enthusiastically showing his ‘antique’ of the first quantum key distribution experiment instrument and introducing the quantum side channel.

My works have been guided by many brilliant minds, including Assad, Syed M; Chen, Luo-Kan; Chen, Tengyun; Cao, Zhu; Fan, Jinyun; Girolami, Davide; Haw, Jing Yan; Huang, Miao; Jiang, Xiao; Lam, Ping Koy; Li, Li; Liu, Ke; Li, Wei; Li, Zheng-Da; Liu, Nai-Le; Liu, Yang; Lu, Chao-Yang; Lu, He; Lutkenhaus Norbert; Ma, Yuwei; Mei, Quanxin; Pan, Jian-Wei; Peng, Cheng-Zhi; Qi, Bing; Ralph, Timothy C; Thompson, Jayne; Vijay, R; Vedral, Vlatko; Weedbrook, Christian; Wang, Weiting; Yan, Zhaopeng; Yao, Xing-Can; Xu, Yuan; Xu, Ping; Zhang, Fang; Zhang, Yan-Bao; Zhang, Zhen; Zhou, Hongyi; Zhou, Shan. I acknowledge my collaborators for sharing their knowledge and offering their help.

Last but not least, I want to sincerely thank my parents for their selfless love and thoughtful care in every details of my life. I am also very grateful to the students and professors in our institute. I would like to thank my friends for bring pleasures in my life.

I Introduction and Preliminaries
1 Introduction
2 Basics of quantum mechanics
2.1 Quantum mechanics formalism—pure states and projective measurements
2.2 Composite systems and subsystems
3 Quantumness, selftesting, and randomness
3.1 Quantumness
3.1.1 Quantum coherence
3.1.2 Quantum entanglement
3.2 Selftesting: Bell nonlocality test
3.2.1 Clauser-Horne-Shimony-Holt inequality
3.2.2 Practical loopholes
3.3 Randomness generation and quantification
3.3.1 Randomness generation
3.3.2 Randomness quantification
II Quantumness and randomness
4 Coherence and randomness
4.1 Quantifying quantum randomness
4.1.1 Quantum randomness against quantum information
4.1.2 Quantum randomness against classical information
4.1.3 Qubit example
4.1.4 Comparison between the two randomness measures
4.2 Coherence or randomness distillation
4.2.1 Comparison with entanglement
4.3 Basis independent randomness and coherence
5 Quantum Bernoulli Factory
5.1 Theoretical protocol
5.2 Experimental realization
5.2.1 Experiment setup
5.2.2 Results
5.3 Simulation of Experiment data
5.3.1 The truncated function
5.3.2 Simulation of the $q$ -coin
III Quantumness and selftesting
6 Measurement-device-independent entanglement witness
6.1 Time-shift attack
6.1.1 EW and device imperfections
6.1.2 Time-shift attack
6.2 The MDIEW scheme
6.3 Experimental realization
6.3.1 Experiment setup
6.3.2 Experiment result
7 Reliable and robust entanglement witness
7.1 Reliable and robust problem in EW
7.2 Reliable entanglement witness
7.2.1 Nonlocal game
7.2.2 MDIEW
7.3 Robust MDIEW
7.3.1 Problem formulation
7.3.2 $\epsilon$ -level optimal EW
7.3.3 Solution
7.3.4 Example
IV Randomness and selftesting
8 Randomness Requirement on CHSH Bell Test in the Multiple Run Scenario
8.1 Randomness Requirement
8.1.1 Randomness loophole
8.1.2 Randomness requirement in Bell test
8.2 Single run case
8.3 Multiple run case
8.3.1 One party Biased
8.3.2 Both parties biased
8.3.3 Discussion
9 Clauser-Horne Bell test with imperfect random inputs
9.1 General randomness requirement
9.2 CH inequality
9.2.1 CH inequality with LHVMs
9.2.2 General strategy (attack)
9.2.3 Result
10 Source-independent quantum random number generation
10.1 The prepare and measure model
10.2 The Protocol
10.2.1 Theoretical protocol
10.2.2 Analysis
10.3 Experiment demonstration
V Other works
11 Open timelike curves
11.1 Open timelike curves
11.1.1 Causality and CTC
11.1.2 OTC and CTC
11.2 Quantum information with OTC
11.2.1 OTC enhanced measurement
11.2.2 Solving NP-complete problems
11.2.3 Cloning with OTCs
12 Quantum theory from axioms
12.1 Axiomization of quantum theory
12.1.1 Operational vs Hilbert space framework
12.1.2 Informational principles and their translation in the Hilbert space language
12.2 Measurement sharpness trims nonlocality and contextualize in every physical theory
12.2.1 Framework
12.2.2 Derivation of CE
12.2.3 Derivation of LO
12.2.4 Sharp Bell inequalities
12.2.5 Methods
A Coherence Distillation Procedure
A.1 Coherence distillation: qubit
A.1.1 Incoherent operations
A.1.2 Coherence loss
A.2 General definition
A.3 A unique measure for pure states
B SI-QRNG
B.1 Calculation of the number of effective $X$ -basis measurements
B.2 Proof of the random sampling property for a type of QRNG input after loss
B.3 Random seed dilution
C Proof for randomness requirement for the CH inequality
C.1 Proof for finite strategies of choosing input settings
C.2 Optimal strategy of the CH test
C.2.1 General condition
C.2.2 Factorizable condition
C.3 Optimal strategy of the CHSH inequality
C.3.1 CH and CHSH inequalities under NS
C.3.2 General condition
C.3.3 Factorizable condition
References

List of Figures

1.1 An illustration of Schrödinger’s cat gedanken experiment.
1.2 Device independent processing of one (a) and two (b) parties.
3.1 Witnessing entanglement via entanglement witness operators.
3.2 Bipartite Bell inequality. The inputs $x$ and $y$ of Alice and Bob are determined by perfect random number generators (RNGs), which produce uniformly distributed random numbers.
3.3 Electron spin detection in the Stern-Gerlach experiment. Assume that the spin takes two directions along the vertical axis, denoted by $\ket{\uparrow}$ and $\ket{\downarrow}$ . If the electron is initially in a superposition of the two spin directions, $\ket{\rightarrow}=(\ket{\uparrow}+\ket{\downarrow})/\sqrt{2}$ , detecting the location of the electron would breaks the coherence and the outcome ( $\uparrow$ or $\downarrow$ ) is intrinsically random.
3.4 Practical QRNGs based on single photon measurement. (a) A photon is originally prepared in a superposition of horizontal (H) and vertical (V) polarizations, described by $(\ket{H}+\ket{V})/\sqrt{2}$ . A polarising beam splitter (PBS) transmits the horizontal and reflects the vertical polarization. For random bit generation, the photon is measured by two single photon detectors (SPDs). (b) After passing through a symmetric beam splitter (BS), a photon exists in a superposition of transmitted (T) and reflected (R) paths, $(\ket{R}+\ket{T})/\sqrt{2}$ . A random bit can be generated by measuring the path information of the photon. (c) QRNG based on measurement of photon arrival time. Random bits can be generated, for example, by measuring the time interval, $\Delta t$ , between two detection events. (d) QRNG based on measurements of photon spatial mode. The generated random number depends on spatial position of the detected photon, which can be read out by an SPD array.
3.5 QRNGs using macroscopic photodetector. (a) Phase-space representation of the vacuum state. The variance of the $X$ -quadrature is 1/4. (b) QRNG based on vacuum noise measurements. The system comprises a strong local oscillator (LO), a symmetric beam splitter (BS), a pair of photon detector (PD), and an electrical subtracter (Sub). (c) Phase-space representation of a partially phase-randomised coherent state. The variance of the $X$ -quadrature is in the order of $n\times\langle\Delta\theta^{2}\rangle$ , where $n$ is the average photon number and $\langle\Delta\theta^{2}\rangle$ is the phase noise variance. (d) QRNGs based on measurements of laser phase noise. The first coupler splits the original laser beam into two beams, which propagate through two optical fibres of different lengths, thereafter interfering at the second coupler. The output signal is recorded by a photon detector. The extra length $\Delta L$ in one fibre introduces a time delay $T_{d}$ between the two paths, which in turn determines the variance of the output signal.
3.6 Illustration of a bipartite Bell test. Alice and Bob are two spacelikely separated parties, that output $a$ and $b$ from random inputs $x$ and $y$ , respectively. A Bell inequality is defined as a linear combination of the probabilities $p(a,b|x,y)$ . For instance, the Clauser-Horne-Shimony-Holt (CHSH) inequality [1] is defined by $S=\sum_{a,b,x,y}(-1)^{a+b+xy}p(a,b|x,y)\leq S_{C}=2$ , where all of the inputs and outputs are bit values, and $S_{C}$ is the classical bound for all local hidden-variable models. With quantum settings, that is, performing measurements $M_{x}^{a}\otimes M_{y}^{b}$ on quantum state $\rho_{AB}$ , $p(a,b|x,y)=\mathrm{Tr}[\rho_{AB}M_{x}^{a}\otimes M_{y}^{b}]$ , the CHSH inequality can be violated up to $S_{Q}=2\sqrt{2}$ . Quantum features (such as intrinsic randomness) manifest as violations of the CHSH inequality.
3.7 A semi-self-testing QRNG. Conditional on the input setting $x$ , the source emits a quantum state $\rho_{x}$ . Conditional on the input $y$ , the detection device measures $\rho_{x}$ and outputs $b$ .
4.1 Quantum randomness. In a bipartite Alice-Eve system described by a pure state $\psi_{AE}$ , the quantum randomenss of a measurement performed by Alice on the system in the mixed state $\rho_{A}$ is given by the amount of uncertainty Eve has on the measurement outcome. Such quantum uncertainty is quantified by the relative entropy of coherence $R_{I}^{Q}(\rho_{A})$ .
4.2 Alternative definition of Quantum randomness. In a bipartite Alice-Eve system described by a pure state $\psi_{AE}$ , the quantum randomness of a measurement performed by Alice on the system in the mixed state $\rho_{A}$ is given by the minimum amount of uncertainty Eve has on the measurement outcome after performing a measurement on her own systems. Such quantum uncertainty is quantified by the convex roof measure $R^{C}_{I}(\rho_{A})$ .
4.3 Comparison of the measures of quantum randomness $R_{z}^{Q}$ (red dotted line) and $R_{z}^{C}$ (blue dot-dashed line) in the qubit state $\rho_{A}(v)=v\ket{+}\bra{+}+\frac{1-v}{2}\mathbb{I}$ versus the mixing parameter $v$ .
4.4 Random number extraction and coherence distillation. The randomness extraction process can be replicated by first distilling the coherence of the quantum state. Measurement outcomes will directly produce uniformly random bits.
5.1 Classical and quantum coin. For a given $p$ value, (a) classical and (b) quantum $p$ -coin corresponds to two different ways of encoding $p$ , see Eqs. (5.1) and (5.2), respectively. The key difference lies in whether there is coherence in the computational basis.
5.2 Experimental setup. (a) Optical image of a transmon qubit located in a trench, which dispersively couples to two 3D Al cavities. (b) Optical image of the single-junction transmon qubit. (c) Scanning electron microscope image of the Josephson junction. (d) Schematic of the device with the main parameters. In our experiment, the higher frequency cavity is not used and always remains in vacuum, which can be used as another $p$ -quoin in future experiments [2]. Note that the highlighted boxes in (a) and (b) are not to scale and are intended for illustrative purposes only.
5.3 Readout properties of the qubit. The phase between the JPA readout signal and the pump has been adjusted such that $\ket{0}$ and $\ket{1}$ states can be distinguished with optimal contrast. a) Bimodal and well-separated histogram of the qubit readout. A threshold $V_{th}=0$ has been chosen to digitize the readout signal. Solid line is for an initial measurement showing about 8.5% $\ket{1}$ state, while dashed line is for a second measurement after initially selecting $\ket{0}$ state. The disappearance of $\ket{1}$ state demonstrates both a high purification and high quantum non-demolition measurement of the qubit. b) Basic qubit readout matrix. The loss of fidelity predominantly comes from the $T_{1}$ process during both the waiting time after the initialization measurement and the qubit readout time.
5.4 Experimental pulse sequences for the preparation of quoins and the measurements in the $Z$ (a) and $X$ (b) bases. An initial measurement M1 is firstly performed to purify the qubit to the ground state $\ket{0}$ . The rotation of the qubit is realized by applying an on-resonance microwave pulse with various amplitudes. The measurement is always performed in the $Z$ -basis. The measurement in the $X$ -basis is realized by performing an extra $R_{\pi/2}^{-Y}$ pre-rotation. The phase of this extra pre-rotation is chosen to minimize the effect from qubit decoherence during the measurement for the case of $p=0.5$ , which is most sensitive to the final qubit readout accuracy.
5.5 Theoretical and experimental results for the (a) $q$ -coin and (b) $f(p)=4p(1-p)$ -coin. Here, the number of experiment data for the $p$ -quoins is in the order of $10^{7}$ and the number for the $f(p)$ -coin is in the order of $10^{6}$ . On average, we need about 20 $p$ -quoins to construct a $f(p)$ -coin. The standard deviations of $p$ , $q$ , and $f(p)$ are in the order of $10^{-4}$ , thus are not plotted in the figure.
5.6 Randomized benchmarking measurement for $R_{\pi/2}^{Y}$ gate fidelity. The reference curve is measured after applying sequences of $m$ random Clifford gates, while the $Y/2$ curve is realized after applying sequences that interleave $R_{\pi/2}^{Y}$ with $m$ random Clifford gates. Each sequence is followed by a recovery Clifford gate in the end right before the final measurement. The number of random sequences of length $m$ in our experiment is $k=100$ . Both curves are fitted to $F=Ap^{m}+B$ with different sequence decay $p$ . The data point is the average of the sequence fidelities of the $k=100$ sample sequences, and the error bar shows the standard deviation of the sample. The average single-qubit gate error $r_{s}=r_{ref}/1.875=(1-p_{ref})/2/1.875=0.0014$ , and the $R_{\pi/2}^{Y}$ gate error $r_{Y/2}=(1-p_{int}/p_{ref})/2=0.0013$ . The dashed lines indicate a gate fidelity of 0.998 and 0.997 respectively.
6.1 (a) Conventional EW setup, where Alice and Bob perform local measurements separately and collect information to decide whether the input state is entangled or not. (b) Measurement-device-independent (MDI) EW setup, where Alice and Bob each prepares an ancillary state and a third party Eve performs Bell state measurements (BSMs) on the ancillary states and the to-be-witnessed bipartite state. Based on the choices of Alice and Bob’s ancillary states and the BSM results, they can judge whether the input state is entangled or not.
6.2 Time shift attack to the conventional EW. (a) Experimental setup of the time-shift attack. Photon pairs are generated by SPDC using a femtosecond pump laser with a central wavelength of 390 nm and a repetition frequency of 80 MHz. POL: polarizer, HWP: half-wave plate, QWP: quarter-wave plate, IF: interference filter with 780 nm central wavelength, PBS: polarizing beam splitter, SFC: single-mode fiber coupler, SMF: single-mode fiber, SPCM: single-photon-counting module, some with extra internal delay lines. (b) Synchronization between SPCMs. Build-in delay lines enable Eve to shift the output signals $d_{a1}$ and $d_{b0}$ by $\Delta t$ . (c) Coincidence count versus time delay, where the time window is set to 4 ns. All data points are measured for 2 seconds, and time-shift attack is implemented with $\Delta t=5.50\pm 0.24$ ns, which corresponds to the grey area.
6.3 Experimental setup for the MDIEW. The photon pairs are generated by type-II SPDC in 2-mm $\beta$ -barium-borate (BBO) crystals. The pulsed pump laser has a central wavelength of 390 nm and a repetition rate of 76 MHz. To prepare the desired state (6.2), two 2-mm decoherer BBOs (D BBO) on each side with fast axis setting at $0^{\circ}$ (up) and $180^{\circ}$ (down) to reduce the spatial walk-off effect. By changing the angle $\theta$ of the selector HWP (S HWP), the desired state (6.2) is prepared with $v=cos^{2}(2\theta)$ . Heralded photons 2 and 5 are triggered by the detections of photon 1 and 6, respectively. Waveplates are used to rotate the polarizations to encode photons 2 and 5 to the desired states, $\left|{{\tau_{s}}}\right\rangle_{2}$ and $\left|{{\omega_{t}}}\right\rangle_{5}$ . The BSM module is composed of three PBSs and two HWPs at $22.5^{\circ}$ . All photons are filtered by narrow-band filters (with $\lambda_{FWHM}$ = 2.8 nm for BSM I and $\lambda_{FWHM}$ = 8.0 nm for BSM II) and then coupled into single-mode fibers which connect to SPCMs.
6.4 MDIEW values are compared for three cases. The theoretical results ( $J_{th}$ , solid line) are calculated for the states $\rho_{AB}^{v}$ with different values of $v$ in Eq. (6.2). The tomography results ( $J_{tom}$ , triangle points) are evaluated for the states $\rho_{34}^{v}$ after performing tomography on the to-be-witnessed bipartite state. Each point of the experimental results ( $J_{exp}$ , circular points) is measured from a 16-hour experiment. Vertical error bars indicate one standard deviation and horizontal error bars of the fitting values $v$ from state tomography are described in Supplemental Materials. The inset shows theoretical and experimental values of tangle for input states $\rho_{34}^{v}$ .
6.5 Tomography of the bipartite state $\rho^{v}_{34}$ . Density matrices are constructed through tomography and over 250,000 coincidence detection events are obtained for each plot. Depending on the angle $\theta$ of the state selector defined in Eq. (6.22), various states ${\rho_{34}^{v}}$ are prepared. (a) Real part of the density matrices $\rho^{v}_{34}$ . (b) Imaginary part of the density matrices $\rho^{v}_{34}$ .
7.1 Entanglement witness and the reliability problem.
7.2 Optimization of entanglement witnesses. (a) To get the optimal witness of an unknown entangled state $\rho$ , one has to run over all possible witnesses. Intuitively, this is done by scanning over all witnesses that are tangent to the set of separable states. (b) The optimization can be efficiently done if certain failure probability can be tolerated.
7.3 Bipartite nonlocal game with classical and quantum inputs. (a) Nonlocal game with classical inputs. Based on the classical inputs $x$ and $y$ , Alice and Bob perform local measurement on the pre-shared entangled state $\rho_{AB}$ , and get classical outputs $a$ and $b$ , respectively. A linear combination of the probability distribution $p(a,b|x,y)$ defines a Bell inequality as shown in Eq. (9.1). (b) Nonlocal game with quantum inputs. The quantum inputs of Alice and Bob are respectively $\omega_{x}$ and $\tau_{y}$ . It is shown [3] that any entangled quantum states can be witnessed with a certain nonlocal game with quantum inputs. Equivalently, if we consider that Alice and Bob each prepares an ancillary state and a third party Eve performs the measurement, this setup also corresponds to the case of MDIEW.
7.4 Simulation results of the original and optimized MDIEW protocol. The to be witness state is the two-qubit Werner state defined in Eq. (7.22). Here, we consider that Alice projects onto $\ket{\Phi_{AA}^{+}}$ and Bob projects onto $\ket{\Phi_{BB}^{-}}$ . In this case, the original MDIEW cannot detect entanglement, while the optimized MDIEW protocol detects all entangle Werner states.
8.1 Bell tests in the bipartite scenario. (a) The inputs of Alice and Bob, $x$ and $y$ , are decided by perfect random number generators (RNGs), which produce uniformly distributed random numbers; (b) The measurement devices are controlled by an adversary Eve through local hidden variables $\lambda$ ; (c) The input random numbers are also controlled by the same local hidden variable $\lambda$ , which is accessible to Eve.
8.2 (Color online) Optimal values of the CHSH test for different randomness $P$ with various rounds $N$ based on only Alice’s inputs biased when conditioned on the hidden variable $\lambda$ . The solid line is the optimal strategy for $N\rightarrow\infty$ , which upper bounds all finite $N$ rounds. Note that the curve is not smooth for finite runs $N$ because the optimal strategy $q_{k_{A}}$ defined in Eq. (8.27) jumps on $l$ . With $N$ grows larger, the curve tends to be smoother.
8.3 (Color online) Possible optimal values of the CHSH test for different randomness $P$ with various rounds $N$ based on uncorrelated inputs of Alice and Bob. The solid line corresponds the strategy for $N\rightarrow\infty$ , which upper bounds all finite $N$ cases. The curves are not smooth for finite $N$ as for similar reasons like in the one party biased case, and it tends to be smooth with $N\rightarrow\infty$ .
9.1 Bell tests in a bipartite scenario. In general, the inputs depend on some local hidden variable $\lambda$ . The local hidden variables that control the inputs and the devices may be different. While, we can still denote these two local hidden variables with a single one denoted as $\lambda$ .
9.2 (Color online) The CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ as a function of $P$ and $Q$ , according to Eq. (9.18).
9.3 (Color online) The CH value $J^{\mathrm{LHVM,Fac}}_{\mathrm{CH}}$ as a function of $P$ and $Q$ with the factorizable condition Eq. (9.7).
9.4 (Color online) The CH value $J^{\mathrm{LHVM,\mathrm{NS}}}_{\mathrm{CH}}$ as a function of $P$ and $Q$ under NS condition Eq. (9.22).
9.5 (Color online) The CH value $J^{\mathrm{LHVM,NS,Fac}}_{\mathrm{CH}}$ as a function of $P$ and $Q$ under factorizable Eq. (9.7) and NS Eq. (9.22) conditions.
9.6 The critical value of $Q$ and $P$ such that the CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}(P,Q)$ equals the maximal quantum value $J_{Q}=(\sqrt{2}-1)/2$ .
9.7 The CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}(\delta)$ under different conditions.
10.1 Illustration of a generic QRNG setup in which we take photon polarization as the example. $H$ and $V$ refer to horizontal and vertical polarizations, respectively. PBS refers to a polarizing beam splitter. (a) The source functions normally (or trusted) and sends superpositions of $H$ and $V$ polarizations, which offers quantum randomness. (b) The source malfunctions (or untrusted) and sends $H$ and $V$ polarizations in a predetermined order, which should output no genuine randomness. From the measurement result viewpoint, one cannot distinguish these two cases.
10.2 (a) Measurement model for SIQRNG. The quantum state first passes through a squasher and is projected as either a qubit or a vacuum. Then, the output qubit is measured in the $X$ or $Z$ basis chosen by an active switch. There are two outcomes for each basis measurement, corresponding to the two eigenstates of the basis. (b) An optical implementation of the SIQRNG in (a), as discussed in Section . Here Pol-M refers to a polarization modulator, PBS refers to a polarizing beam splitter, and $D_{0}$ and $D_{1}$ are the threshold detectors.
10.3 Source-independent QRNG with the finite data size effect. The results are proven in Section 10.2.2.
10.4 An equivalent protocol of source-independent QRNG.
10.5 Experiment setup of SIQRNG. S: laser source; LP: linear polarizer; FPC: fiber polarization controller; FA: fiber attenuator; BS: beam splitter; PBS: polarizing beam splitter; TD: time delay implemented with a 12 m fiber; PD: photon detector.
10.6 Relation between the phase error rate and the loss. The big error bars are caused by a very conservative estimation of statistical fluctuations and also partially by the fluctuation of experimental parameters for different losses.
10.7 Dependency of randomness generation rate on the loss. The data points on the figure are taken to be the lower bound of the rate, evaluated by random sampling. The security parameter is $\varepsilon_{t}=2\times 2^{-50}$
10.8 The autocorrelation function of the raw data and the final data. The x-axis is the lag $j$ between the sampled data $X_{i}$ and $X_{i+j}$ , while the y-axis is the autocorrelation $R(j)$ defined in Eq. (10.7). Data sizes of both the raw data and the final data are in the order of $10^{7}$ . The autocorrelation of the final data is significantly smaller than the raw data in absolute value. Due to finite-key-size effect, the autocorrelation cannot be zero even for perfectly random strings.
10.9 The P-value of the statistical tests. The x-axis lists the names of statistical tests in the NIST test suite. The final data size is 91 Mbit, which is extracted from 115 Mbit raw data. To pass each test, the P-value should be at least 0.01 and the proportion of sequences that satisfy $P>0.01$ should be at least 96%. It can be seen in the figure that the P-values of all tests are greater than $0.01$ .
11.1 Deutschian timelike curves. (a) depicts a physical visualization of a CTC, where an object entering one mouth of a wormhole at some point $t_{A}$ may jump to a prior time $t_{B}$ (with respect to an chronology respecting observer) and interact with its past self via some unitary $U$ . (b) In the special case where no interaction occurs, we obtain an open timelike curve. This naturally occurs, for example, in instances where the wormhole mouths are spatially separated.
11.2 CTCs and OTCs in presence of ancilla. $A$ represents the system to be sent through the space-time wormhole, and $B$ some chronology respecting system initially correlated with $A$ . (a) In general CTCs, temporal self-consistency demands that $\rho_{\mathrm{CTC}}$ satisfies $\rho_{\mathrm{CTC}}=\mathrm{Tr}_{\neq A}[U(\rho^{\mathrm{(in)}}_{AB}\otimes\rho_{\mathrm{CTC}})U^{\dagger}]$ . (b) In the case of OTCs, this implies that system $A$ has state $\rho_{\mathrm{OTC}}=\mathrm{Tr}_{\neq A}\left[\rho^{\mathrm{(in)}}_{AB}\otimes\rho_{\mathrm{OTC}}\right]=\rho_{A}$ after application of the protocol.
11.3 Quantum circuit of OTC enhanced measurement. The protocol first introduces $N$ ancilla qudits, all of which are initialized in the state $\rho_{E}=\ket{0}\bra{0}$ , where $\ket{0}$ is an eigenstate of $\hat{O}$ . A sequence of $C_{+}$ gates then perfectly correlates each ancilla with $\rho$ with respect to $\hat{O}$ basis. The erasure of these correlations via OTCs, followed by $\hat{O}$ measurements on each individual qudit, allows determination of $\mathrm{Tr}[\hat{O}\rho]$ to a standard error that scales inversely with $N^{2}$ .
11.4 Solving NP-complete problems with OTCs. The key non-linear gate $S$ , that takes $\rho(n_{z})$ to $\rho(n_{z}^{2})$ , can be implemented via open timelike curves. This is achieved by the use of a single OTC, applied between two successive $C_{+}$ gates.
11.5 OTC Assisted Cloning. An arbitrary qudit $\rho$ can be cloned to any desired fidelity. The process involves (i) application of a standard quantum cloner $C$ to generate $O(d^{2})$ imperfect copies, and (ii) use of OTC enhanced measurements to measure different observables $M_{i}$ on each imperfect copy. We can choose $M_{i}$ to be informationally complete, and OTCs ensure that we can determine $\mathrm{Tr}\left[{M_{i}\rho}\right]$ to any desired precision. Thus this protocol can yield (to any fixed precision) the classical description of $\rho$ .
12.1 The structure of sharp measurements. (a) Every non-sharp measurement $\{m_{x}\}_{x\in\set{X}}$ (round diagram on the l.h.s.) is equivalent to a sharp measurement $\{M_{x}\}_{x\in\set{X}}$ (triangular diagram on the r.h.s.) performed on the system along with an environment. (b) Coarse-graining a sharp measurement $\{m_{x}\}_{x\in\set{X}}$ yields a new sharp measurement $\{m^{\prime}_{y}\}_{y\in\set{Y}}$ . (c) When two sharp measurements $\{m_{x}\}$ and $\{m_{y}\}$ are performed in parallel, they yield a new sharp measurement $\{m_{x}\otimes n_{y}\}$ .
12.2 Winning graphs for examples of nonlocal games. The vertices are coloured so that two connected vertices have distinct colours, using the minimum number of colours. (a) Winning graph for the CHSH game. The player win if $y_{1}\oplus y_{2}=x_{1}x_{2}$ and 0 otherwise. The graph is not perfect, because the largest clique in the graph has 3 vertices while the number of colours in the graph is 4. Here classical strategies are not optimal among the strategies that satisfy LO. (b) Winning graph for Guess Your Neighbor’s Input [4] in the case of $N=4$ parties. The players win $+1$ if $y_{i}=x_{i+1}$ for every $i$ . The graph is a disjoint union of disconnected cliques and therefore classical strategies are optimal among all strategies satisfying LO. (c) Winning graph for the game “Guess the Product” in the case of $N=3$ parties. The players win $+1$ if $y_{i}=x_{1}x_{2}x_{3}$ for every $i$ and 0 otherwise. The graph is perfect and therefore the classical strategy is optimal [5]. (d) Winning graph for the game “Guess the Parity” in the case of $N=3$ parties. The players win + 1 if $y_{i}=x_{1}\oplus x_{2}\oplus x_{3}$ for every $i$ and 0 otherwise. The graph is not perfect, because it contains odd cycles with more than 3 vertices. Still, classical strategies are optimal for this game, as shown in Supplementary Note 2 for arbitrary number of players.

List of Tables

3.1 Properties that a coherence measure should satisfy.
3.2 Properties that an entanglement measure should satisfy.
3.3 A brief summary of trusted-device QRNG demonstrations. Detailed description of these schemes can be found in Section and . Note that the quality/security of random numbers in different demonstrations may be different. Raw: reported raw generation rate, Refined: reported refined rate, Acquisition: data acquisition by dedicated hardware or commercial oscilloscope, SPD: single photon detector, BS: beam splitter, MCP-PCID: micro-channel-plate-based photon counting imaging detector, PNRD: photon-number-resolving detector, CMOS: complementary metal-oxide-semiconductor, $-$ : no related information found.
3.4 A summary of self-testing and semi-self-testing QRNG demonstrations. MDI: measurement device independent, SI: source independent, CV: continuous variable.
4.1 Comparing the frameworks of coherence and entanglement. DI: device-independent; MDI: measurement-device-independent; QKD: quantum key distribution; QRNG: quantum random number generation.
5.1 Experiment results. $\theta$ is the angle of the quoin state; $N_{p}$ is the total number of $p$ -quoins prepared. About half of the prepared $p$ -coins are measured in the $X$ basis to prepare the $q$ -coin. $q_{\mathrm{th}}$ is the theoretically estimated value based on the estimation of $p$ ; $q_{\mathrm{exp}}$ is the experimentally estimated value from the obtained $q$ -coins; $f_{\mathrm{th}}(p)$ is the theoretically estimated value from the estimation of $p$ ; $f_{\mathrm{exp}}(p)$ is the experimentally estimated value from the obtained $f(p)$ -coins; $N_{f(p)}$ is the number of $f(p)$ -coins obtained.
6.1 Decomposition of $W$ based on different measurement outcomes.
6.2 Coefficients and probabilities for MDIEW with outcomes $++$ and $--$ . Note that when $\beta=0$ , the corresponding probability $p$ is irrelevant.
6.3 Coefficients and probabilities for MDIEW with outcomes $+-$ and $-+$ . Note that when $\beta=0$ , the corresponding probability $p$ is irrelevant.
6.4 Our MDIEW in the form of Eq. (6.21) for the bipartite states defined in Eq. (6.8).
6.5 Error estimation of density matrix, real non-zero parts
6.6 The tangle values of the input states by tomography.
8.1 The lower bound for randomness parameter $P$ defined in Eq (9.2) that allows the CHSH value $S$ , defined in Eq. (9.5), to reach the quantum bound $S_{Q}$ by LHVMs in the CHSH test under different conditions.
9.1 The value of $J_{\lambda}$ with deterministic strategy.
9.2 Possible strategies for letting $J_{\lambda}$ be positive.
9.3 The coefficient $\beta_{ij}$ of $q(\lambda_{j})p_{i}(\lambda_{j})$ in the expression of $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ of the CH inequality.
9.4 The coefficient of $p_{i}(\lambda_{j})$ in the expression of $J^{\mathrm{LHVM}}_{\mathrm{CHSH}}$ of the CHSH inequality.
C.1 The coefficient of $p_{i}(\lambda_{j})$ in the expression of J.
C.2 The coefficient of $p_{i}(\lambda_{j})$ in the expression of J.

Part I Introduction and Preliminaries

Chapter 1 Introduction

Manipulation of quantum information empowers many tasks such as communication [6, 7, 8], computation [9, 10], and simulation [11, 12]. In a communication task, the quantum key distribution protocol by Charles Bennett and Gilles Brassard in 1984 (BB84)[6] makes it possible to extend secret keys between two remote parties by transmitting quantum signals. Such a task has been proven impossible by classical methods. In quantum computing, the Shor algorithm named after its inventor Peter Shor [9] factorizes integers 111given an integer $N$ , find its prime factors at exponentially faster speeds than any existing classical methods. In quantum simulation [13], one can efficiently simulate quantum systems that requires exponential resources with a classical computer.

To investigate the origin of the quantum information processing power, we need to find the major difference between the manipulation of quantum and classical information. Focusing on states of physical systems, the major distinction derives from the quantum features or quantumness. In different scenarios, the quantumness manifests differently. For instance, considering the whole system, coherent superpositions on a computational basis inherently differ from classical (stochastic) mixtures of the basis states. Named quantum coherence, such coherent superpositions underlies the quantumness of a single quantum system [14]. In bipartite systems, the quantumness can also be defined as the nonlocal correlation between the two systems. Considering local operation and classical communication (LOCC) as free or classical operations, quantum entanglement is the major quantumness in bipartite states [15, 16, 17].

One important research direction is the quantumness of states, which aims to analyze quantumness in a systematic way. Quantum coherence and entanglement can be quantified by resource frameworks. In general, a quantumness resource framework relies on identifying classical states and classical operations. The corresponding quantumness of an operational task emerges when a quantum behavior cannot be explained by classical means. A state is called classical when it exhibits no quantum behavior. Denote the set of classical states by $\mathcal{C}$ , then a state that does not belong to $\mathcal{C}$ is called quantum. Based on classical states, classical operations are physically realizable operations that cannot generate quantum states from any classical state. With classical states and operations, a resource framework of quantumness is completed by defining measures, which is a real-valued functions of states. Generally, a quantumness measure should satisfy the monotonicity requirement: that is, classical operations cannot increase the quantumness of a system.

From a mathematical viewpoint, finding legitimate quantumness measures is important for completing the quantumness resource framework. Moreover, these quantities should have meaningful interpretation in detailed operational tasks. For instance, the entanglement of formation measures the amount of maximally entangled states on average that are required to prepare the target state in the asymptotical scenario; the distillable entanglement measures the amount of maximally entangled states on average that can be obtained via LOCC operations on the target states in the asymptotical scenario.

This thesis focuses on the operational interpretation of quantumness measures. We consider two operational tasks: randomness processing and selftesting quantum information processing and investigate the key roles of quantumness in both tasks. We also probe the interplay between randomness and selftesting quantum information tasks.

Quantumness and randomness

In classical theory, all physical processes are deterministic due to basic Newton’s laws. In contrast, Born’s rule [18] endows the quantum world with true randomness. Such is the counter-intuitiveness of the result that Einstein was quoted as saying ‘God is not playing at dice’. Nevertheless, the intrinsically random nature of measurement outcomes is now considered a key characteristic that distinguishes quantum mechanics from classical theory [19].

In measurement theory, decoherence, breaking coherence or superposition, in a specific (classical) computational basis results in random outcomes [20]. Intuitively, from the resource perspective, the randomness can be generated by consuming coherence of a quantum state. In order to quantitatively establish this connection, one needs to find a proper way to assess the randomness of measurement, which normally contains quantum and classical processes. The superficially random outcomes in classical processes are generally not truly random, although they might appear so if information is ignored. Thus, such classical part of randomness should be precluded when quantifying a quantum feature — coherence. A quantum process, on the other hand, can generate genuine unpredictable randomness, which we call intrinsic (quantum) randomness. Observing such intrinsic random outcomes of measurements would indicate non-classical (quantum) features of objects.

As an example, we can consider the famous Schrödinger’s cat gedanken experiment as shown in Fig. 1.1. In a classical world, a cat might be either alive or dead before observation, which can be described by the density matrix $\rho_{\mathrm{cat}}^{\mathrm{C}}=(\ket{\mathrm{alive}}\bra{\mathrm{alive}}+\ket{\mathrm{dead}}\bra{\mathrm{dead}})/2$ for the case of being alive and dead equally likely, see Fig. 1.1 (a). The observation result of whether the cat is alive or dead looks random, which is due to the lack of knowledge of the cat system. After considering some hidden variables or an ancillary system $E$ that purifies $\rho_{\mathrm{cat}}^{\mathrm{C}}$ , $\ket{\Psi}=\left(\ket{\mathrm{alive}}\ket{0}_{E}+\ket{\mathrm{dead}}\ket{1}_{E}\right)/\sqrt{2}$ , we can simply observe the system $E$ to infer whether the cat is alive or dead. In quantum mechanics, the cat can be in a coherent superposition of the states of alive and dead, $\rho_{\mathrm{cat}}^{\mathrm{Q}}=\ket{\psi}\bra{\psi}$ , where $\ket{\psi}=\left(\ket{\mathrm{alive}}+\ket{\mathrm{dead}}\right)/\sqrt{2}$ , see Fig. 1.1 (b). The observation outcome would be intrinsically random according to Born’s rule. That is, without directly accessing the system of the cat and breaking the coherence, we can never predict whether the cat is alive or dead better than blindly guessing. Therefore, the existence of intrinsic randomness can be regarded as a witness for quantum coherence.

This thesis investigates the relation between intrinsic randomness and quantum coherence [21, 22]. We show that the amount of intrinsic randomness arising from state measurements in a basis indicates the amount of coherence in the same basis. Therefore, we can naturally regard coherence as the resource for generating randomness. In addition, we examine a simple yet interesting randomness processing task, called the Bernoulli factory [23, 24] and investigate how coherence can be used to beat classical method. In the discussed randomness tasks, we reveal that coherence is an important resource for both randomness generation and processing.

Quantumness and selftesting

The concept of selftesting is unique in quantum information processing. A selftesting or device independent protocol can maintain its property even with untrusted devices that do not assume the physical implementations [25, 26, 27]. Considering the process in Fig. 1.2(a), a general picture for a single party process involves classical or quantum random input $x$ and output $a$ . Without assuming the physical implementation of the box that transforms inputs into outputs, what we observe in practice is the probability distribution of $p(a|x)$ . Then, a selftesting protocol is to ensure its property only based on the probability distribution of $p(a|x)$ . In the bipartite scenario, Fig. 1.2(b), we can further impose other requirements on the two parties. For instance, in the Bell test, we generally assume no-signaling between the two parties. Given the inputs $x,y$ , outputs $a,b$ , and the probability distribution $p(a,b|x,y)$ , the Bell test has proven remarkable power in many tasks.

Here, we take randomness generation as an example. True random numbers can be generated when measuring a coherent state on its basis. Such randomness, however, assumes physical implementation (i.e., the state and its measurement). We instead consider whether random numbers can be generated without the physical implementations. The single device in Fig. 1.2(a) cannot be used to device independently generate randomness because one can always choose a predetermined sequence that satisfies the probability distribution $p(a|x)$ . As the sequence is predetermined, no randomness is generated, thus, it appears that randomness cannot be generated in a selftesting way.

Now, we consider the two-devices case in Fig. 1.2(b). Following a similar argument, both parties can produce predetermined probability distributions $p(a|x)$ and $p(b|y)$ to simulate the observed probability distribution $p(a,b|x,y)$ . Although the two parties cannot communicate, they can still share a predetermined strategy $\lambda$ that is independent of the inputs. The probability distribution of a given $\lambda$ is $p(a|x,\lambda)p(b|y,\lambda)$ . On average, the probability distribution the two parties can simulate with predetermined strategy is

[TABLE]

where $p(\lambda)$ is the probability distribution of $\lambda$ . Such a strategy is called local hidden variable models [28]. We then question whether the probability distribution Eq. (1.1) covers all possible probability distributions $p(a,b|x,y)$ . If the answer is yes, then the observed probability distribution $p(a,b|x,y)$ cannot certify randomness; if no, the process generates true random numbers. Remarkably, there exist probability distributions $p(a,b|x,y)$ that emerge from measuring entangled states cannot be simulated as Eq. (1.1). In practice, when observing a probability distribution that cannot be simulated, the randomness in the output $x,y$ is assured without assuming any physical realizations.

Although a selftesting protocol does not assume physical realizations, the quantumness remains the crucial ingredient for demonstrating quantum advantages. If there is no quantumness, the process becomes classical and cannot be verified device independently. Conversely, quantumness can be witnessed by selftesting protocols. For instance, entanglement is revealed by violations of Bell inequalities. This thesis investigates the relation between quantumness and selftesting [29, 30, 31, 32]. Specifically, we investigate the witnessing of general multipartite entanglement via a selftesting protocols with quantum states as inputs.

Randomness and selftesting

A selftesting protocol requires genuine random input. If the input is also predetermined, the device can always simulate any probability distribution by a predetermined strategy. The randomness (freewill) loophole refers to the underlying assumption in Bell tests that different measurement settings can be chosen randomly (freely). Generally, a Bell test requires the input of each party to be fully random in order to avoid information leakage between different parties. If there is a local hidden variable that shares information about the random inputs, where in the worst scenario, the inputs are all predetermined such that each party knows exactly the input of the other party, it is possible to violate Bell inequalities just with local hidden variable model (LHVM) strategies [28]. Since one can always argue that there might exist a powerful creator who determines everything including all the Bell test experiments, this loophole is widely believed to be impossible to close perfectly. In this case, as we cannot prove or disprove the existence of true randomness, the assumption of freewill is indispensable in general Bell tests.

Yet, it is still meaningful to discuss the randomness requirement222The imperfect input randomness requirement is sometimes called measurement dependence in literature. of Bell tests in a practical scenario. In this thesis, we suppose that the randomness generation devices are partially controlled by an adversary Eve, who thus possesses certain knowledge of the inputs of Alice and Bob [33, 34]. Then Eve can make use of the information about the inputs to fake violations of Bell’s inequalities [35] and hence lead to the device independent tasks insecure. Therefore, it is interesting to see how much of randomness needed for a Bell test in order to ensure the correctness of the conclusion. This is especially meaningful when considering a loophole free Bell test [36, 37] and its applications to practical tasks in the presence of an eavesdropper.

In summary, this thesis investigates three main topics— quantumness, randomness, and selftesting, and the interplays among these features. Following a brief introduction to quantum information theory, we discuss these three features and their interplays from different perspectives. We also investigate two studies on quantum information in general relativity and axioms and discuss the basic principles of quantum information theory and their extension to a general physical theory.

Chapter 2 Basics of quantum mechanics

This chapter briefly covers the basics of quantum mechanics. We focus on the formalism of quantum information formalism, which involves the density matrix and the positive observable valued measures (POVMs). Due to the length limit, we only present the results that are used in this thesis. For a more detailed introduction of quantum information, please see Ref. [38, 39, 40].

2.1 Quantum mechanics formalism—pure states and projective measurements

In this part, we review the Dirac notation of quantum mechanics and its equivalent form of vectors.

Pure states

Unlike classical mechanics, quantum states can be expressed as a superposition of different bases. Following the Dirac bra ket representation, a pure quantum state can be denoted as

[TABLE]

where the set of $\mathcal{I}=\{\ket{i}\}$ represents a state basis, such as the polarization of the photon or the energy levels of an atom. Suppose the dimension of the $\mathcal{I}$ basis is $d$ , then we can regard the state space as a $d$ -dimensional Hilbert space $\mathcal{H}_{d}$ and quantum states as vectors. Suppose $\mathcal{I}=\{\ket{i}\}$ forms an orthogonal basis, then quantum states can be equivalently denoted as

[TABLE]

Measurements

In the Dirac representation, a projective measurement can be denoted in bra form as,

[TABLE]

The measurement probability is given by

[TABLE]

Suppose $\mathcal{I}=\{\ket{i}\}$ forms an orthogonal basis, then the probability is given by

[TABLE]

Similar to the vector representation, a projective measurement can be denoted as a dual vector as

[TABLE]

and the probability of measuring $\ket{\psi}$ is given by the square of the inner product

[TABLE]

Observables

In quantum mechanics, an observable $O$ is a Hermitian operator that satisfies $\bra{\psi}O\ket{\phi}=\overline{\bra{\phi}O\ket{\psi}}$ for two arbitrary states $\ket{\psi}$ and $\ket{\phi}$ . Here $\overline{\bra{\phi}O\ket{\psi}}$ is the complex conjugate of $\bra{\phi}O\ket{\psi}$ . Therefore, the average of $O$ for state $\ket{\psi}$ is given by the real value

[TABLE]

When $\ket{\psi}$ is denoted as a vector and $\bra{\psi}$ as a dual vector, $O$ can be denoted as a Hermitian matrix $O_{ij}=\bra{i}O\ket{j}$ . In this case, the average value can be given by matrix multiplication $\bar{O}=\sum_{i,j}\psi_{i}^{*}O_{ij}\psi_{j}$ .

Any Hermitian operator $O$ has a spectral decomposition,

[TABLE]

where $\{\ket{o_{i}}\}$ forms an orthogonal basis, so the average value of $O$ can also be regarded as a projective measurement on the $\{\ket{o_{i}}\}$ basis. The probability of the $i$ th outcome is $p_{i}=|\bra{o_{i}}\ket{\psi}|^{2}$ and the average is

[TABLE]

Evolution

In quantum mechanics, the evolution of a quantum state is determined by the Schr $\mathrm{\ddot{o}}$ dinger equation. Given the Hamiltonian ${H}$ of the system, we have

[TABLE]

Considering a closed system where energy is conserved, the Hamiltonian ${H}$ is independent of time and the state can be determined by

[TABLE]

where $U(t,t_{0})=\int_{t_{0}}^{t}e^{-i{H}t/\hbar}\mathrm{d}t$ gives the evolution of the state and $\ket{\psi(t_{0})}$ is the state at time $t_{0}$ .

In quantum mechanics, the Hamiltonian ${H}$ is Hermitian. In the vector representation, ${H}$ corresponds to a Hermitian matrix that satisfies ${H}^{{\dagger}}={H}$ . Here ${\dagger}$ denotes the hermitian conjugate of ${H}$ . Because

[TABLE]

the matrix representation of ${H}$ is given by $H_{i,j}=\bra{i}{H}\ket{j}$ . In this case, the evolution operator $U(t,t_{0})$ can be regarded as a unitary operator that satisfies

[TABLE]

where $U(t_{0},t)=\int_{t}^{t_{0}}e^{-i{H}t/\hbar}\mathrm{d}t$ .

To summary, when considering a $d$ -dimensional system, pure quantum states, projective measurements, observables and state evolution can be represented as vectors, dual vectors, Hermitian operators, and unitary operators, respectively. Because the vector representation is equivalent to the Dirac bra ket representation, we use them interchangeably.

2.2 Composite systems and subsystems

Composite system

We now consider two systems $A$ and $B$ that are defined in Hilbert space $\mathcal{H}_{d_{A}}$ and $\mathcal{H}_{d_{B}}$ , respectively. Similarly, a pure quantum state on system $AB$ can be represented as

[TABLE]

where $\mathcal{I}_{A}=\{\ket{i_{A}}_{A}\}$ and $\mathcal{I}_{B}=\{\ket{i_{B}}_{B}\}$ forms orthogonal bases for systems $A$ and $B$ , respectively. Equivalently, in vector form, we have

[TABLE]

Projective measurements, observables and evolution can be similarly defined. Next, we move to the density matrix description of states for subsystems. Before, we first redefine the pure state formalism with states and measurement denoted as matrices.

Subsystems

In the Dirac notation, quantum states and measurements are given by $\ket{\psi}$ and $\bra{M}$ and the measurement probability is given by $|\bra{M}\psi\rangle|^{2}$ , respectively. Equivalently, we can denote quantum state as $\rho=\ket{\psi}\bra{\psi}$ and a measurement as $P=\ket{M}\bra{M}$ . Then the measurement probability is

[TABLE]

where $\mathrm{Tr}$ is the trace operation. Thus, quantum states and measurements can also be denoted as matrices.

Consider now a projective measurement $P_{A}=\ket{M}_{A}\bra{M}_{A}$ on system $A$ of $\ket{\psi}_{AB}$ and the probability of the measurement. Because systems $A$ and $B$ are correlated, we cannot directly calculate the probability of solely measuring system $A$ from the state. In this case, we consider that a projective measurement on the $\mathcal{I}_{B}$ basis is also applied on system $B$ . The probability of projecting onto $\ket{M}_{A}\bra{M}_{A}$ and $\ket{i_{B}}_{B}\bra{i_{B}}_{B}$ is calculated by

[TABLE]

and the probability of projecting system $A$ onto $P_{A}$ is

[TABLE]

where

[TABLE]

with $\mathrm{Tr}_{B}$ being the trace over system $B$ only.

Therefore, when measuring a subsystem, one can equivalently describe the state with a density matrix by tracing out the other systems. In general, it is easy to verify that a density matrix $\rho$ should have the following properties:

•

$\rho$ is a Hermitian operator.

•

$\rho$ is nonnegative. That is, its spectral values are nonnegative.

•

$\mathrm{Tr}[\rho]=1$ .

Qubit systems

A two-level system is referred to as a quantum bit or qubit system. Its density matrix is a $2\times 2$ Hermitian matrix. The Pauli matrices

[TABLE]

together with the identity matrix $I$ form a basis for $2\times 2$ Hermitian matrices. That is, given $e_{0}=I$ , $e_{1}=\sigma_{x}$ , $e_{2}=\sigma_{y}$ , $e_{3}=\sigma_{z}$ , and the inner product to be the inner product of matrices, we can verify that

[TABLE]

For any single qubit state $\rho$ , its Bloch sphere representation is

[TABLE]

Here, $n_{x},n_{y},n_{z}$ are real values and $n_{x}^{2}+n_{y}^{2}+n_{z}^{2}\leq 1$ . It is easy to verify that

[TABLE]

Two qubit states $\rho_{AB}$ can also be decomposed in the Pauli matrices basis,

[TABLE]

Here, the coefficients are determined by the average value

[TABLE]

Purification and Schmidt decomposition

For any state $\rho$ with a spectral decomposition of $\rho=\sum_{i}p_{i}\ket{\psi_{i}}\bra{\psi_{i}}$ , one can always find its purification. That is, we can find a pure bipartite state $\ket{\psi}_{AB}$ such that

[TABLE]

such that $\rho=\mathrm{Tr}_{B}[\ket{\psi}_{AB}\bra{\psi}_{AB}]$ .

For any pure state $\ket{\psi}_{AB}$ of a bipartite system, orthonormal bases $\{\ket{i}_{A}\}$ and $\{\ket{i}_{B}\}$ exist such that:

[TABLE]

The subsystems $A$ and $B$ have the same eigenvalues, $p_{i}$ s and the number of $p_{i}$ s is called the Schmidt number of $\ket{\psi}_{AB}$ . The pure state is an entangled state when the Schmidt number is greater than one. It is easy to see that the Bell states are entangled states.

Positive observable valued measures

When performing local measurements on a joint state, the state can be denoted as a density matric to simplify the calculation. We now consider performing a joint measurement and want to know the measurement probability of a local system. Suppose a joint measurement $P_{AB}=\ket{M}_{AB}\bra{M}_{AB}$ is performed on $\rho_{A}\otimes\rho_{B}$ , then the probability distribution is

[TABLE]

Here, we first trace out system $B$ to get

[TABLE]

In this case, when focusing only on system $A$ , the effective measurement performed on system $A$ is $M_{A}$ . Therefore, general measurements are defined by positive observable valued measures (PVOMs). That is, $\{M_{i}\geq 0,\forall i\}$ and $\sum_{i}M_{i}=I$ .

Entropy of quantum states

For any quantum state $\rho$ , the definition of function $f$ acting on $\rho$ is given by acting on its spectral decomposition:

[TABLE]

where $\rho=\sum_{i}\lambda_{i}\ket{i}\bra{i}$ and $\{\ket{i}\}$ forms an orthogonal basis.

The entropy of quantum state $\rho$ is defined as

[TABLE]

or equivalently

[TABLE]

Here, $0\log 0\equiv 0$ .

The relative entropy of quantum state is

[TABLE]

We also know that $S(\rho||\sigma)\geq 0$ , where the equality holds if and only if $\rho=\sigma$ .

For a bipartite quantum state $\rho_{AB}$ , the conditional entropy and mutual information is defined by

[TABLE]

When $\rho_{AB}=\sum_{i}p_{i}\ket{i}\bra{i}\otimes\rho_{i}$ , we also have

[TABLE]

where $H(p_{i})=-\sum_{i}p_{i}\log p_{i}$ .

Chapter 3 Quantumness, selftesting, and randomness

This chapter introduces the basic concepts of quantumness, selftesting, and randomness. In Sec. 3.1, I introduce the quantumness of states, including the coherence of a single quantum system and the entanglement of multipartite systems. Sec. 3.2 introduces the Bell nonlocality test, which is the basic tool for selftesting protocols. Finally, Sec. 3.3 reviews the development of quantum random number generation.

3.1 Quantumness

This section introduces the basics of quantumness of states. For a single quantum system, we focus on its coherence on the computational basis. We refer to Ref. [41, 42] for recent reviews on this subject. For multipartite systems, we mainly focus on entanglement correlations. Nice reviews on this subject are available in Ref. [43, 17, 44].

3.1.1 Quantum coherence

As a key feature of quantum mechanics, coherence measures the superposition power on the computational basis and is often considered as a basic ingredient for quantum technologies [45, 46]. Considerable effort has been undertaken to theoretically formulate the quantum coherence [47, 48, 49, 50, 51, 14, 52, 53]. Recently, a comprehensive framework of coherence quantification has been established [14], by which coherence is considered to be a resource that can be characterized, quantified and manipulated in a manner similar to that of another important feature— quantum entanglement [15, 16, 43, 17]. Here, we focus on the resource framework of coherence.

In a general $d$ -dimensional Hilbert space and a computational basis ${I}=\{\ket{i}\}_{i=1,2,\dots,d}$ , coherence measures its superposition power on the basis. Note that, any state that can be represented by a diagonal state of ${I}$ , that is,

[TABLE]

has no superposition. Thus, such state is called incoherent (classical) state and the set of such state is denoted by $\mathcal{I}$ . Conversely, a maximally coherent state is given by the maximal superposition state

[TABLE]

up to arbitrary relative phases between the components $\ket{i}$ .

When considering coherence as a resource, incoherent states are thus “useless” or “free” states. If we make an analogy with the theory of thermodynamics, incoherent state would be similar to thermal states from which no energy can be extracted. In thermodynamics, thermal operations are generally considered as free operations. Applying a thermal operations on thermal state results in a thermal state. In the same spirit, one can define a “free” or incoherent operation to be the operation that transforms incoherent state only to incoherent state. That is, incoherent operations are defined by incoherent completely positive trace preserving (ICPTP) maps $\Phi_{\mathrm{ICPTP}}(\rho)=\sum_{n}K_{n}\rho K_{n}^{\dagger}$ , where the Kraus operators $\{K_{n}\}$ satisfy $\sum_{n}K_{n}K_{n}^{\dagger}=I$ and $K_{n}\mathcal{I}K_{n}^{\dagger}\subset\mathcal{I}$ . In the case, where post-selections are enabled, the output state corresponding to the $n$ th Kraus operation is given by $\rho_{n}={K_{n}\rho K_{n}^{\dagger}}/{p_{n}}$ , where $p_{n}=\mathrm{Tr}\left[K_{n}\rho K_{n}^{\dagger}\right]$ is the probability of obtaining the outcome $n$ .

Given the definition of incoherent states and incoherent operations, we can measure the amount of coherence. Generally, a measure of coherence is a map $C$ from quantum state $\rho$ to a real non-negative number that satisfies the properties listed in Table 3.1.

There are various measures for coherence. Considering the distance measure of two quantum states, the measure of coherence may be defined as the minimum distance from $\rho$ to all incoherent states in $\mathcal{I}$ . Two examples [14] are now presented.

Relative entropy: Here the relative entropy is used as the distance measure.

[TABLE]

where $\rho_{diag}$ only contains diagonal elements of $\rho$ .

$l_{1}$ norm: Another distance measure is a function of the off-diagonal elements of the quantum state. The simplest form is the $l_{1}$ norm, which is given by

[TABLE]

Besides the distance measures, coherence can be defined in other ways.

Convex roof: According to [21], the intrinsic randomness is also a measure of coherence, therefore we have

[TABLE]

where $\rho=\sum_{e}p_{e}\ket{\psi_{e}}\bra{\psi_{e}}$ and $\sum_{e}p_{e}=1$ , and the minimization runs over all possible decompositions.

3.1.2 Quantum entanglement

Entanglement framework

Quantum entanglement describes the nonlocal correlation between different systems. For instance, considering in the bipartite scenario, any product state

[TABLE]

has no nonclassical correlation. In addition, a mixture of classical states should also be classical state. Thus, we say that a state is separable when it can be written as

[TABLE]

where $p_{i}\geq 0,\forall i,\sum_{i}p_{i}=1$ .

Similar to the framework of coherence, we also need to define classical operations for entanglement. In the same spirit of incoherent operations that do not create coherence from incoherent states, a separable operation is defined such that no entangled state can be generated from separable state. In practice, the operation of local operation and classical communication (LOCC) draws more attention because it has operational meanings. In this case, as a strict subset of separable operations, LOCC are generally referred as the “free” operation for entanglement.

Given the definitions of separable states and LOCC operations, we propose measures for entanglement that have the following properties.

Two widely adopted measures are the relative entropy of entanglement and the entanglement of formation.

1. Relative entropy of entanglement:

[TABLE]

where the minimization runs over all separable states $\sigma^{AB}$ .

2. Entanglement of formation:

[TABLE]

where the minimization runs over all possible decomposition of $\rho^{AB}=\sum_{e}p_{e}\ket{\psi_{e}^{AB}}\bra{\psi_{e}^{AB}}$ and $E_{EOF}(\ket{\psi_{e}^{AB}})=S(\rho^{A})$ with $\rho^{A}$ being the density matrix of system $A$ .

Entanglement witness

Quantum entanglement plays an important role in the nonclassical phenomena of quantum mechanics. Being the key resource for many tasks in quantum information processing, such as quantum computation [54], quantum teleportation [7] and quantum cryptography [6, 8], entanglement needs to be verified in many scenarios. There are several proposals to witness entanglement and we refer to Ref. [44] for a detailed review.

A conventional way to detect entanglement, entanglement witness (EW), gives one of two outcomes: ‘Yes’ or ‘No’, corresponding to conclusive result that the state is entangled or fail to draw a conclusion, respectively. Mathematically, for a given entangled quantum state $\rho$ , an Hermitian operator $W$ is called a witness, if $tr[W\rho]<0$ (output of ‘Yes’) and $tr[W\sigma]\geq 0$ (output of ‘No’) for any separable state $\sigma$ . Note that there could also exist entangled state $\rho^{\prime}$ such that $tr[W\rho^{\prime}]\geq 0$ (output of ‘No’). The EW method is shown schematically in Fig. 3.1.

In an experimental verification, one can realize the conventional EW with only local measurements by decomposing $W$ into a linear combination of product Hermitian observables [44]. For example, we can consider a Werner state and the EW

[TABLE]

with $v\in[0,1]$ and $|\Psi^{-}\rangle=(|01\rangle-|10\rangle)/\sqrt{2}$ and $I$ being the identity matrix. The state is entangled if $v<1/3$ , which can be witnessed by the EW,

[TABLE]

and its result, $\mathrm{Tr}[W\rho^{v}_{AB}]=(3v-1)/4$ . When considering local measurements of Pauli operators, it is easy to verify that

[TABLE]

In experiment, one simply measures local observables of $I\otimes I$ , $\sigma_{x}\otimes\sigma_{x}$ , $\sigma_{y}\otimes\sigma_{y}$ , $\sigma_{z}\otimes\sigma_{z}$ and take the average to get the estimation of $W$ .

3.2 Selftesting: Bell nonlocality test

The basic idea of selftesting quantum information processing is to guarantee the quantum advantage with only the observed statistics instead of the implementation device. The key ingredient for fully selftesting is based on the violation of Bell inequalities. Bell test [19] is motivated to rule out local hidden variable models (LHVMs) [28]. The faithful violation of a Bell inequality assures that the underlying physical process cannot be explained with LHVMs. In quantum information processing, violations of Bell’s inequalities are powerful tools that enable device independent tasks, such as quantum key distribution [55, 26, 56, 57], randomness amplification [58, 59, 60] and generation [61, 62, 63, 64], entanglement quantification [65], and dimension witness [66]. In this section, we introduce the background of Bell nonlocality test and leave its application to randomness generation in next section. We also leave the discussion of semi-selftesting quantum information in Part III.

3.2.1 Clauser-Horne-Shimony-Holt inequality

One of the best-known Bell inequalities is the Clauser-Horne-Shimony-Holt (CHSH) inequality [1], which may be expressed in many ways. We study it from a quantum game point of view.

As shown in Fig. 9.1, two space-like separated parties, Alice and Bob, choose input bit settings $x$ and $y$ at random and output bits $a$ and $b$ based on their inputs and pre-shared quantum ( $\rho$ ) and classical ( $\lambda$ ) resources, respectively. The probability distribution $p(a,b|x,y)$ , obtaining outputs $a$ and $b$ conditioned on inputs $x$ and $y$ , is determined by specific strategies of Alice and Bob. By assuming that the input settings $x$ and $y$ are chosen fully randomly and equally likely, the CHSH inequality is defined by a linear combination of the probability distribution $p(a,b|x,y)$ according to

[TABLE]

where the plus operation $\oplus$ is modulo 2, $\cdot$ is numerical multiplication, and $S_{C}$ is the (classical) bound of the Bell value $S$ for all LHVMs.

An achievable bound for the quantum theory is $S_{Q}=2\sqrt{2}$ [67]. In this case, a violation of the classical bound $S_{C}$ indicates the need for alternative theories other than LHVMs, such as quantum theory. For general no signalling (NS) theories [68], we denote the corresponding upper bound as $S_{NS}=4$ . It is straightforward to see that $S_{NS}\geq S_{Q}\geq S_{C}$ .

Different strategies impose different constraints on the probability distribution.

•

Classical: $p(a,b|x,y)=\sum_{\lambda}q(\lambda)p(a|x,\lambda)p(b|y,\lambda)$

•

Quantum: $p(a,b|x,y)=\mathrm{Tr}[\rho_{AB}M_{a}^{x}\otimes M_{b}^{y}]$

•

No-signaling: $\sum_{a}p(a,b|x,y)=\sum_{a}p(a,b|x^{\prime},y),\sum_{b}p(a,b|x,y)=\sum_{b}p(a,b|x,y^{\prime})$

3.2.2 Practical loopholes

In practice, the conclusion of the violation of a Bell test depends on several assumptions. Experimental demonstrations suffer from three major loopholes; a faithful Bell test should close all such loopholes.

Locality loophole: The measurement events of Alice and Bob should be space-like separated. If this condition is not satisfied, Bell’s inequality can be violated by signaling even with LHVMs. This loophole can be closed by separating Alice and Bob sufficiently far apart such that the measurement events become space-like separated. In experiment, this loophole is closed in optical systems [69] and appears nearly closed in atomic systems [70].

Efficiency loophole: The detection efficiency must be greater than a threshold to ensure violation of Bell inequalities without assuming fair sampling. The famous Clauser-Horne (CH) or Eberhard [71] test show that the efficiency should be at least 2/3 for each party, which is also proven to be a tight bound [72, 73] for all bipartite Bell tests with two inputs. The efficiency loophole has been closed in different realizations [74, 75, 76].

Randomness loophole: The inputs $x$ and $y$ should be random and thus cannot be predetermined. Also, we require $x$ and $y$ to be uncorrelated with each other and also come from different runs [77, 78]. In experiment, this loophole cannot be closed perfectly, as we can never unconditionally certify the randomness without a faithful Bell test, which in turn requires faithful randomness. Thus, we have to assume the existence of a true random seed. In practice, we can use independent RNGs, such as causally disconnected cosmic photons [79]. Conversely, if we can well characterize the randomness, we can also check whether the input randomness satisfies the requirement [35, 77, 77, 78, 21, 34] that guarantees the conclusion even with imperfectly randomness input.

In experiment, the conclusion of a Bell test is not faithful unless these three major loopholes are closed. In addition, we must address several technical issues that may also invalidate the Bell test conclusion or make a violation impossible.

Coincidence-time problem: If the local detection time depends on the measurement settings, a coincidence time loophole [80] may exist. This loophole can be solved by distinguishing each coincidence detection event such that it does not depend on the measurement settings.

Imperfect devices: The experiment devices cannot be perfect, which will affect the result of a Bell test.

•

Source: The input photon source will differ from the desired source due to practical imperfections. For instance, the photon source may contain multiple photon pairs, which will affect the fidelity of the prepared state.

•

Dark count: The measurement of the state will be affected by dark counts from environment. A Bell violation will be observed only if the dark count is below a certain threshold.

•

Misalignment error: In experiment, the measurement may contain misalignment errors that output opposite result. Misalignment error should also be below a certain threshold to guarantee a Bell violation.

Finite statistics, memory problem: In the most general scenario, the measurement devices of Alice and Bob contain a memory such that the outputs of the current run can be conditioned on the inputs and outputs of previous runs [81]. In this case, we cannot directly obtain the probability distribution. This loophole can only be closed by considering statistics test of a Bell inequality with correlated strategy. Most previous experiments [75, 76] consider asymptotic condition and assume the data to be independent and identically distributed (i.i.d.).

Nonuniform random inputs: Nonuniform random inputs do not affect the CH inequality, which is defined by a linear combination of probability distributions. However, Eberhard’s inequality, which is used in practice, should be normalized when the input random bits are not uniform. In this case, we have to consider finite statistics with nonuniform random inputs.

3.3 Randomness generation and quantification

In this section, we review the developments of quantum random number generators [82].

3.3.1 Randomness generation

Random numbers play essential roles in many fields, such as, cryptography [83], scientific simulations [84], lotteries, and fundamental physics tests [85]. These tasks rely on the unpredictability of random numbers, which generally cannot be guaranteed in classical processes. In computer science, random number generators (RNGs) are based on pseudo-random number generation algorithms [86], which deterministically expand a random seed. Although the output sequences are usually perfectly balanced between 0s and 1s, a strong long-range correlation exists, which can undermine cryptographic security, cause unexpected errors in scientific simulations, or open loopholes in fundamental physics tests [87, 35, 33].

Many researchers have attempted to certify randomness solely based on the observed random sequences. In the 1950s, Kolmogorov developed the Kolmogorov complexity concept to quantify the randomness in a certain string [88]. A RNG output sequence appears random if it has a high Kolmogorov complexity. Later, many other statistical tests [89, 90, 91] were developed to examine randomness in the RNG outputs. However, testing a RNG from its outputs can never prevent a malicious RNG from outputting a predetermined string that passes all of these statistical tests. Therefore, true randomness can only be obtained via processes involving inherent randomness.

In quantum mechanics, a system can be prepared in a superposition of the (measurement) basis states, as shown in Fig. 3.3. According to Born’s rule, the measurement outcome of a quantum state can be intrinsically random, i.e. it can never be predicted better than blindly guessing. Therefore, the nature of inherent randomness in quantum measurements can be exploited for generating true random numbers. Within a resource framework, coherence [14] can be measured similarly to entanglement [15]. By breaking the coherence or superposition of the measurement basis, it is shown that the obtained intrinsic randomness comes from the consumption of coherence. In turn, quantum coherence can be quantified from intrinsic randomness [21].

A practical QRNG can be developed using the simple process as shown in Fig. 3.3. Based on the different implementations, there exists a variety of practical QRNGs. Generally, these QRNGs are featured for their high generation speed and a relatively low cost. In reality, quantum effects are always mixed with classical noises, which can be subtracted from the quantum randomness after properly modelling the underlying quantum process [92].

The randomness in the practical QRNGs usually suffices for real applications if the model fits the implementation adequately. However, such QRNGs can generate randomness with information-theoretical security only when the model assumptions are fulfilled. In the case that the devices are manipulated by adversaries, the output may not be genuinely random. For example, when a QRNG is wholly supplied by a malicious manufacturer, who copies a very long random string to a large hard drive and only outputs the numbers from the hard drive in sequence, the manufacturer can always predict the output of the QRNG device.

On the other hand, a QRNG can be designed in a such way that its output randomness does not rely on any physical implementations. True randomness can be generated in a self-testing way even without perfectly characterizing the realisation instruments. The essence of a self-testing QRNG is based on device-independently witnessing quantum entanglement or nonlocality by observing a violation of the Bell inequality [85]. Even if the output randomness is mixed with uncharacterised classical noise, we can still get a lower bound on the amount of genuine randomness based on the amount of nonlocality observed. The advantage of this type of QRNG is the self-testing property of the randomness. However, because the self-testing QRNG must demonstrate nonlocality, its generation speed is usually very low. As the Bell tests require random inputs, it is crucial to start with a short random seed. Therefore, such a randomness generation process is also called randomness expansion.

In general, a QRNG comprises a source of randomness and a readout system. In realistic implementations, some parts may be well characterised while others are not. This motivates the development of an intermediate type of QRNG, between practical and fully self-testing QRNGs, which is called semi-self-testing. Under several reasonable assumptions, randomness can be generated without fully characterising the devices. For instance, faithful randomness can be generated with a trusted readout system and an arbitrary untrusted randomness resource. A semi-self-testing QRNG provides a trade off between practical QRNGs (high performance and low cost) and self-testing QRNGs (high security of certified randomness).

In the last two decades, there have been tremendous development for all the three types of QRNG, trusted-device, self-testing, and semi-self-testing. In fact, there are commercial QRNG products available in the market. A brief summary of representative practical QRNG demonstrations that highlights the broad variety of optical QRNG is presented in Table 3.3. These QRNG schemes will be discussed further in Section 3.3.1 and 3.3.1. A summary of self-testing and semi-self-testing QRNG demonstrations is presented in Table 3.4, which will be reviewed in details in Section 3.3.1 and 3.3.1.

Trusted-device QRNG I: single-photon detector

True randomness can be generated from any quantum process that breaks coherent superposition of states. Due to the availability of high quality optical components and the potential of chip-size integration, most of today’s practical QRNGs are implemented in photonic systems. In this survey, we focus on various implementations of optical QRNGs.

A typical QRNG includes an entropy source for generating well-defined quantum states and a corresponding detection system. The inherent quantum randomness in the output is generally mixed with classical noises. Ideally, the extractable quantum randomness should be well quantified and be the dominant source of the randomness. By applying randomness extraction, genuine randomness can be extracted from the mixture of quantum and classical noise. The extraction procedure is detailed in Methods.

Qubit state

Random bits can be generated naturally by measuring a qubit111A qubit is a two-level quantum-mechanical system, which, similar to a bit in classical information theory, is the fundamental unit of quantum information. $\ket{+}=(\ket{0}+\ket{1})/\sqrt{2}$ in the $Z$ basis, where $\ket{0}$ and $\ket{1}$ are the eigenstates of the measurement $Z$ . For example, Fig. 3.4 (a) shows a polarization based QRNG, where $\ket{0}$ and $\ket{1}$ denote horizontal and vertical polarization, respectively, and $\ket{+}$ denotes $+45^{o}$ polarization. Fig. 3.4 (b) presents a path based QRNG, where $\ket{0}$ and $\ket{1}$ denote the photon traveling via path $R$ and $T$ , respectively.

The most appealing property of this type of QRNGs lies on their simplicity in theory that the generated randomness has a clear quantum origin. This scheme was widely adopted in the early development of QRNGs [118, 94, 93]. Since at most one random bit can be generated from each detected photon, the random number generation rate is limited by the detector’s performance, such as dead time and efficiency. For example, the dead time of a typical silicon SPD based on an avalanche diode is tens of ns [119]. Therefore, the random number generation rate is limited to tens of Mbps, which is too low for certain applications such as high-speed quantum key distribution (QKD), which can be operated at GHz clock rates [120, 121]. Various schemes have been developed to improve the performance of QRNG based on SPD.

Temporal mode

One way to increase the random number generation rate is to perform measurement on a high-dimensional quantum space, such as measuring the temporal or spatial mode of a photon. Temporal QRNGs measure the arrival time of a photon, as shown in Fig. 3.4 (c). In this example, the output of a continuous-wave laser is detected by a time-resolving SPD. The laser intensity can be carefully controlled such that within a chosen time period $T$ , there is roughly one detection event. The detection time is randomly distributed within the time period $T$ and digitized with a time resolution of $\delta_{t}$ . The time of each detection event is recorded as raw data. Thus for each detection, the QRNG generates about $\log_{2}(T/\delta_{t})$ bits of raw random numbers. Essentially, $\delta_{t}$ is limited by the time jitter of the detector (typically in the order of 100 ps), which is normally much smaller than the detector deadtime (typically in the order of 100 ns) [119].

One important advantage of temporal QRNGs is that more than one bit of random number can be extracted from a single-photon detection, thus improving the random number generation rate. The time period $T$ is normally set to be comparable to the detector deadtime. Comparing to the qubit QRNG, the temporal-mode QRNG alleviates the impact of detection deadtime. For example, if the time resolution and the dead time of an SPD are 100 ps and 100 ns respectively, the generation rate of temporal QRNG is around $\log_{2}(1000)\times$ 10 Mbps, which is higher than that of the qubit scheme (limited to 10 Mbps). The temporal QRNGs have been well studied recently [122, 96, 97, 98, 99].

Spatial mode

Similar to the case of temporal QRNG, multiple random bits can be generated by measuring the spatial mode of a photon with a space-resolving detection system. One illustrative example is to send a photon through a $1\times N$ beam splitter and to detect the position of the output photon. Spatial QRNG has been experimentally demonstrated by using a multi-pixel single-photon detector array [95], as shown in Fig. 3.4 (d). The distribution of the random numbers depends on both the spatial distribution of light intensity and the efficiency uniformity of the SPD arrays.

The spatial QRNG offers similar properties as the temporal QRNG, but requires multiple detectors. Also, correlation may be introduced between the random bits because of cross talk between different pixels in the closely-packed detector array.

Multiple photon number states

Randomness can be generated not only from measuring a single photon, but also from quantum states containing multiple photons. For instance, a coherent state

[TABLE]

is a superposition of different photon-number (Fock) states $\{\ket{n}\}$ , where $n$ is the photon number and $|\alpha|^{2}$ is the mean photon number of the coherent state. Thus, by measuring the photon number of a coherent laser pulse with a photon-number resolving SPD, we can obtain random numbers that follow a Poisson distribution. QRNGs based on measuring photon number have been successfully demonstrated in experiments [100, 101, 102]. Interestingly, random numbers can be generated by resolving photon number distribution of a light-emitting diode (LED) with a consumer-grade camera inside a mobile phone, as shown in a recent study [123].

Note that, the above scheme is sensitive to both the photon number distribution of the source and the detection efficiency of the detector. In the case of a coherent state source, if the loss can be modeled as a beam splitter, the low detection efficiency of the detector can be easily compensated by using a relatively strong laser pulse.

Trusted-device QRNG II: macroscopic photodetector

The performance of an optical QRNG largely depends on the employed detection device. Beside SPD, high-performance macroscopic photodetectors have also been applied in various QRNG schemes. This is similar to the case of QKD, where protocols based on optical homodyne detection [124] have been developed, with the hope to achieve a higher key rate over a low-loss channel. In the following discussion, we review two examples of QRNG implemented with macroscopic photodetector.

Vacuum noise

In quantum optics, the amplitude and phase quadratures of the vacuum state are represented by a pair of non-commuting operators ( $X$ and $P$ with $[X,P]=i/2$ ), which cannot be determined simultaneously with an arbitrarily high precision [125], i.e. $\langle(\Delta X)^{2}\rangle\times\langle(\Delta P)^{2}\rangle\geq 1/16$ , with $\Delta O$ defined by $O-\langle O\rangle$ and $\langle O\rangle$ denoting the average of $O$ . This can be easily visualised in the phase space, where the vacuum state is represented by a two-dimensional Gaussian distribution centered at the origin with an uncertainty of $1/4$ (the shot-noise variance) along any directions, as shown in Fig. 3.5 (a). In principle, Gaussian distributed random numbers can be generated by measuring any field quadrature repeatedly. This scheme has been implemented by sending a strong laser pulse through a symmetric beam splitter and detecting the differential signal of the two output beams with a balanced receiver [103, 104, 105].

Given that the local oscillator (LO) is a single-mode coherent state and the detector is shot-noise limited, the random numbers generated in this scheme follow a Gaussian distribution, which is on demand in certain applications, such as Gaussian-Modulated Coherent States (GMCS) QKD [124]. There are several distinct advantages of this approach. First, the resource of quantum randomness, the vacuum state, can be easily prepared with a high fidelity. Second, the performance of the QRNG is insensitive to detector loss, which can be simply compensated by increasing the LO power. Third, the field quadrature of vacuum is a continuous variable, suggesting that more than one random bit can be generated from one measurement. For example, 3.25 bits of random numbers are generated from each measurement [103].

In practice, an optical homodyne detector itself contributes additional technical noise, which may be observed or even controlled by a potential adversary. A randomness extractor is commonly required to generate secure random numbers. To extract quantum randomness effectively, the detector should be operated in the shot-noise limited region, in which the overall observed noise is dominated by vacuum noise. We remark that building a broadband shot-noise limited homodyne detector operating above a few hundred MHz is technically challenging [126, 127, 128]. This may in turn limit the ultimate operating speed of this type of QRNG.

Amplified spontaneous emission

To overcome the bandwidth limitation of shot-noise limited homodyne detection, researchers have developed QRNGs based on measuring phase [108, 109, 110, 112, 111, 113] or intensity noise [106, 107] of amplified spontaneous emission(ASE), which is quantum mechanical by nature [129, 92, 130].

In the phase-noise based QRNG scheme, random numbers are generated by measuring a field quadrature of phase-randomized weak coherent states (signal states). Figure 3.5 (c) shows the phase-space representation of a signal state with an average photon number of $n$ and a phase variance of $\langle(\Delta\theta)^{2}\rangle$ . If the average phase of the signal state is around $\pi/2$ , the uncertainty of the $X$ -quadrature is of the order of $n\langle(\Delta\theta)^{2}\rangle$ . When $n$ is large, this uncertainty can be significantly larger than the vacuum noise. Therefore, phase noise based QRNG is more robust against detector noise. In fact, this scheme can be implemented with commercial photo-detectors operated above GHz rates.

QRNG based on laser phase noise was first developed using a cw laser source and a delayed self-heterodyning detection system [108], as shown in Fig. 3.5 (d). Random numbers are generated by measuring the phase difference of a single-mode laser at times $t$ and $t+T_{d}$ . Intuitively, if the time delay $T_{d}$ is much larger than the coherence time of the laser, the two laser beams interfering at the second beam splitter can be treated as generated by independent laser sources. In this case, the phase difference is a random variable uniformly distributed in $[-\pi,\pi)$ , regardless of the classical phase noise introduced by the unbalanced interferometer itself. This suggests that a robust QRNG can be implemented without phase-stabilizing the interferometer. On the other hand, by phase-stabilizing the interferometer, the time delay $T_{d}$ can be made much shorter than the coherent time of the laser [108], enabling a much higher sampling rate. This phase stabilization scheme has been adopted in a $\geq 6$ Gbps QRNG [110] and a 68 Gbps QRNG demonstration [113].

Phase noise based QRNG has also been implemented using pulsed laser source, where the phase difference between adjacent pulses is automatically randomized [109, 112, 111]. A speed of 80 Gbps (raw rate as shown in Table1) has been demonstrated [111]. It also played a crucial role in a recent loophole-free Bell experiment [131]. Here, we want to emphasize that strictly speaking, none of these generation speeds are real-time, due to the speed limitation of the randomness extraction [92]. Although such limitation is rather technical, in practice, it is important to develop extraction schemes and hardware that can match the fast random bit generation speed in the future.

Self-testing QRNG

Realistic devices inevitably introduce classical noise that affects the output randomness, thus causing the generated random numbers depending on certain classical variables, which might open up security issues. To remove this bias, one must properly model the devices and quantify their contributions. In the QRNG schemes described in Section 3.3.1 and Section 3.3.1, the output randomness relies on the device models [92, 130]. When the implementation devices deviate from the theoretical models, the randomness can be compromised. In this section, we discuss self-testing QRNGs, whose output randomness is certified independent of device implementations.

Self-testing randomness expansion

In QKD, secure keys can be generated even when the experimental devices are not fully trusted or characterised [55, 26]. Such self-testing processing of quantum information also occur in randomness generation (expansion). The output randomness can be certified by observing violations of the Bell inequalities [85], see Fig. 3.6. Under the no-signalling condition [68] in the Bell tests, it is impossible to violate Bell inequalities if the output is not random, or, predetermined by local hidden variables.

Since Colbeck [132, 61] suggested that randomness can be expanded by untrusted devices, several protocols based on different assumptions have been proposed. For instance, in a non-malicious device scenario, we can consider that the devices are honestly designed but get easily corrupt by unexpected classical noises. In this case, instead of a powerful adversary that may entangle with the experiment devices, we can consider a classical adversary who possesses only classical knowledge of the quantum system and analyzes the average randomness output conditioned by the classical information. Based on the Clauser-Horne-Shimony-Holt (CHSH) inequality [1], Fehr et al. [62] and Pironio et al. [63] proposed self-testing randomness expansion protocols against classical adversaries. The protocols quadratically expands the input seed, implying that the length of the input seed is $O(\sqrt{n}\log_{2}\sqrt{n})$ , where $n$ denotes the experimental iteration number.

A more sophisticated exponential randomness expansion protocol based on the CHSH inequality was proposed by Vidick and Vazirani [64], in which the lengths of the input seed is $O(\log_{2}n)$ . In the same work, they also presented an exponential expansion protocol against quantum adversaries, where quantum memories in the devices may entangle with the adversary. The Vidick-Vazirani protocol against quantum adversaries places strict requirements on the experimental realisation. Miller and Shi [133] partially solved this problem by introducing a more robust protocol. Combined with the work by Chung, Shi, and Wu [134], they also presented an unbounded randomness expansion scheme. By adopting a more general security proof, Miller and Shi [135] recently showed that genuinely randomness can be obtained as long as the CHSH inequality is violated. Their protocol greatly improves the noise tolerance, indicating that an experimental realisation of a fully self-testing randomness expansion protocol is feasible.

The self-testing randomness expansion protocol relies on a faithful realisation of Bell test excluding the experimental loopholes, such as locality and efficiency loopholes. The randomness expansion protocol against classical adversaries is firstly experimentally demonstrated by Pironio et al. [114] in an ion-trap system, which closes the efficiency loophole but not the locality loophole. To experimentally close the locality loophole, a photonic system is more preferable when quantum memories are unavailable. As the CHSH inequality is minimally violated in an optically realised system [76, 75], the randomness output is also very small (with min-entropy of $H_{\mathrm{min}}=7.2\times 10^{-5}$ in each run), and the randomness generation rate is $0.4$ bits/s. To maximise the output randomness, the implementation settings are designed to maximally violate the CHSH inequality. Due to experimental imperfections, the chosen Bell inequality might be sub-optimal for the observed data. In this case, the output randomness can be optimised over all possible Bell inequalities [136, 137].

Although nonlocality or entanglement certifies the randomness, the three quantities, nonlocality, entanglement, and randomness are not equivalent [138]. Maximum randomness generation does not require maximum nonlocal correlation or a maximum entangled state. In the protocols based on the CHSH inequality, maximal violation (nonlocality and entanglement) generates 1.23 bits of randomness. It is shown that 2 bits of randomness can be certified with little involvement of nonlocality and entanglement [138]. Furthermore, as discussed in a more generic scenario involving nonlocality and randomness, it is shown that maximally nonlocal theories cannot be maximally random [139].

Randomness amplification

In self-testing QRNG protocols based on the assumption of perfectly random inputs, the output randomness is guaranteed by the violations of Bell tests. Conversely, when all the inputs are predetermined, any Bell inequality can be violated to an arbitrary feasible value without invoking a quantum resource. Under these conditions, all self-testing QRNG protocols cease to work any more. Nevertheless, randomness generation in the presence of partial randomness is still an interesting problem. Here, an adversary can use the additional knowledge of the inputs to fake violations of Bell inequalities. The task of generating arbitrarily free randomness from partially free randomness is also called randomness amplification, which is impossible to achieve in classical processes.

The first randomness amplification protocol was proposed by Colbeck and Renner [58]. Using a two-party chained Bell inequality [140, 141], they showed that any Santha-Vazirani weak sources [142] (defined in next section), with $\epsilon<0.058$ , can be amplified into arbitrarily free random bits in a self-testing way by requiring only no-signaling. A basic question of randomness amplification is whether free random bits can be obtained from arbitrary weak randomness. This question was answered by Gallego et al. [59], who demonstrated that perfectly random bits can be generated using a five-party Mermin inequality [143] with arbitrarily imperfect random bits under the no-signaling assumption.

Randomness amplification is related to the freewill assumption [87, 35, 144, 77, 78, 145, 33] in Bell tests. In experiments, the freewill assumption requires the inputs to be random enough such that violations of Bell inequalities are induced from quantum effects rather than predetermined classical processes. This is extremely meaningful in fundamental Bell tests, which aim to rule out local realism. Such fundamental tests are the foundations of self-testing tasks, such as device-independent QKD and self-testing QRNG. Interestingly, self-testing tasks require a faithful violation of a Bell inequality, in which intrinsic random numbers are needed. However, to generate faithful random numbers, we in turn need to witness nonlocality which requires additional true randomness. Therefore, the realisations of genuine loophole-free Bell tests and, hence, fully self-testing tasks are impossible. Self-testing protocols with securities independent of the untrusted part can be designed only by placing reasonable assumptions on the trusted part.

Semi-self-testing QRNGs

Traditional QRNGs based on specific models pose security risks in fast random number generation. On the other hand, the randomness generated by self-testing QRNGs is information-theoretically secure even without characterising the devices, but the processes are impractically slow. As a compromise, intermediate QRNGs might offer a good tradeoff between trusted and self-testing schemes — realising both reasonably fast and secure random number generation.

As shown in Fig. 3.7, a typical QRNG comprises two main modules, a source that emits quantum states and a measurement device that detects the states and outputs random bits. In trusted-device QRNGs, both source and measurement devices [92, 130] must be modeled properly; while the output randomness in the fully self-testing QRNGs does not depend on the implementation devices.

In practice, there exist scenarios that the source (respectively, measurement device) is well characterised, while the measurement device (respectively, source) not. Here, we review the semi-self-testing QRNGs, where parts of the devices are trusted.

Source-independent QRNG

In source-independent QRNG, the randomness source is assumed to be untrusted, while the measurement devices are trusted. The essential idea for this type of scheme is to use the measurement to monitor the source in real time. In this case, normally one needs to randomly switch among different (typically, complement) measurement settings, so that the source (assumed to be under control of an adversary) cannot predict the measurement ahead. Thus, a short seed is required for the measurement choices.

In the illustration of semi-self-testing QRNG, Fig. 3.7, the source-independent scheme is represented by a unique $x$ (corresponding to a state $\rho_{x}$ ) and multiple choices of the measurement settings $y$ . In Section 3.3.1, we present that randomness can be obtained by measuring $\ket{+}$ in the $Z$ basis. However, in a source-independent scenario, we cannot assume that the source emits the state $\ket{+}$ . In fact, we cannot even assume the dimension of the state $\rho_{x}$ . This is the major challenge facing for this type of scheme.

In order to faithfully quantify the randomness in the $Z$ basis measurement, first a squashing model is applied so that the to-be-measured state is equivalent to a qubit [146]. Note that this squashing model puts a strong restriction on measurement devices. Then, the measurement device should occasionally project the input state onto the $X$ basis states, $\ket{+}$ and $\ket{-}$ , and check whether the input is $\ket{+}$ [115]. The technique used in the protocol shares strong similarity with the one used in QKD [147]. The $X$ basis measurement can be understood as the phase error estimation, from which we can estimate the amount of classical noise. Similar to privacy amplification, randomness extraction is performed to subtract the classical noise and output true random values.

The source-independent QRNG is advantageous when the source is complicated, such as in the aforementioned QRNG schemes based on measuring single photon sources [94, 93, 118], LED lights [123], and phase fluctuation of lasers [110]. In these cases, the sources are quantified by complicated or hypothetical physical models. Without a well-characterized source, randomness can still be generated. The disadvantage of this kind of QRNGs compared to fully self-testing QRNGs is that they need a good characterization of the measurement devices. For example, the upper and the lower bounds on the detector efficiencies need to be known to avoid potential attacks induced from detector efficiency mismatch. Also the intensity of light inputs into the measurement device needs to be carefully controlled to avoid attacks on the detectors.

Recently, a continuous-variable version of the source-independent QRNG is experimentally demonstrated [116] and achieves a randomness generation rate over 1 Gbps. Moreover, with state-of-the-art devices, it can potentially reach the speed in the order of tens of Gbps, which is similar to the trusted-device QRNGs. Hence, semi-self-testing QRNG is approaching practical regime.

Measurement-device-independent QRNGs

Alternatively, we can consider the scenario that the input source is well characterised while the measurement device is untrusted. In Fig. 3.7, different inputs $\rho_{x}$ (hence multiple $x$ ) are needed to calibrate the measurement device with a unique setting $y$ . Similar to the source-independent scenario, the randomness is originated by measuring the input state $\ket{+}$ in the $Z$ basis. The difference is that here the trusted source sends occasionally auxiliary quantum states $\rho_{x}$ , such as $\ket{0}$ , to check whether the measurement is in the $Z$ basis [148]. The analysis combines measurement tomography with randomness quantification of positive-operator valued measure, and does not assume to know the dimension of the measurement device, i.e., the auxiliary ancilla may have an arbitrary dimension.

The advantage of such QRNGs is that they remove all detector side channels, but the disadvantage is that they may be subject to imperfections in the modeling of the source. This kind of QRNG is complementary to the source-independent QRNG, and one should choose the proper QRNG protocol based on the experimental devices.

We now turn to two variations of measurement-device-independent QRNGs. First, the measurement tomography step may be replaced by a certain witness, which could simplify the scheme at the expense of a slightly worse performance. Second, similar to the source-independent case, a continuous-variable version of measurement-device-independent QRNG might significantly increase the bit rate. The challenge lies on continuous-variable entanglement witness and measurement tomography.

Other semi-self-testing QRNGs

Apart from the above two types of QRNGs, there are also some other QRNGs that achieve self-testing except under some mild assumptions. For example, the source and measurement devices can be assumed to occupy independent two-dimensional quantum subspaces [117]. In this scenario, the QRNG should use both different input states and different measurement settings. The randomness can be estimated by adopting a dimension witness [149]. A positive value of this dimension witness could certify randomness in this scenario, similar to the fact that a violation of the Bell inequality could certify randomness of self-testing QRNG in Section 3.3.1.

Outlook

The needs of “perfect” random numbers in quantum communication and fundamental physics experiments have stimulated the development of various QRNG schemes, from highly efficient systems based on trusted devices, to the more theoretically interesting self-testing protocols. On the practical side, the ultimate goal is to achieve fast random number generation at low cost, while maintaining high-level of randomness. With the recent development on waveguide fabrication technique [150], we expect that chip-size, high-performance QRNGs could be available in the near future. In order to guarantee the output randomness, the underlying physical models for these QRNGs need to be accurate and both the quantum noise and classical noise should be well quantified. Meanwhile, by developing a semi-self-testing protocol, a QRNG becomes more robust against classical noises and device imperfections. In the future, it is interesting to investigate the potential technologies required to make the self-testing QRNG practical. With the new development on single-photon detection, the readout part of the self-testing QRNG can be ready for practical application in the near future. The entanglement source, on the other hand, is still away from the practical regime (Gbps).

On the theoretical side, the study of self-testing QRNG has not only provided means of generating robust randomness, but also greatly enriched our understanding on the fundamental questions in physics. In fact, even in the most recent loophole-free Bell experiment [151, 152, 153, 154] where high-speed QRNG has played a crucial role, it is still arguable whether it is appropriate to use randomness generated based on quantum theory to test quantum physics itself. Other random resources have also been proposed for loophole-free Bell’s inequality tests, such as independent comic photons [79]. It is an open question whether we can go beyond QRNG and generate randomness from a more general theory.

3.3.2 Randomness quantification

Here, we breifly review the quantification of randomness.

Min-entropy source

Given the underlying probability distribution, the randomness of a random sequence $X$ on $\{0,1\}^{n}$ can be quantified by its min-entropy

[TABLE]

For example, for a uniform random sequence $X$ on $\{0,1\}^{n}$ , when $\mathrm{Prob}[X=v]=1/2^{n},\forall v\in\{0,1\}^{n}$ , the min-entropy is given $H_{\mathrm{min}}=n$ . As another example, we consider the same random sequence $X$ except that $\mathrm{Prob}[X=0]=1/2$ and $\mathrm{Prob}[X=v]=1/2^{n+1},\forall v\in\{0,1\}^{n}/{0}$ . Although the two sequences looks very similar, the min-entropy for the latter example is much more smaller, $H_{\mathrm{min}}=1$ .

Santha-Vazirani weak sources [142]

We assume that random bit numbers are produced in the time sequence $x_{1},x_{2},...,x_{j},...$ . Then, for $0<\epsilon\leq 1/2$ , the source is called $\epsilon$ -free if

[TABLE]

for all values of $j$ . Here $e$ represents all classical variables generated outside the future light-cone of the Santha-Vazirani weak sources.

Randomness extractor

A RNG typically consists of two components, an entropy source and a randomness extractor [150]. In a QRNG, the entropy source could be a physical device whose output is fundamentally unpredictable, while the randomness extractor could be an algorithm that generates nearly perfect random numbers from the output of the above preceding entropy source, which can be imperfectly random. The two components of QRNG are connected by quantifying the randomness with min-entropy. The min-entropy of the entropy source is first estimated and then fed into the randomness extractor as an input parameter.

The imperfect randomness of the entropy source can already be seen in the SPD based schemes, such as the photon number detection scheme. By denoting $N$ as the discrimination upper bound of a photon number resolving detector, at most $log_{2}(N)$ raw random bits can be generated per detection event. However, as the photon numbers of a coherent state source follows a Poisson distribution, the raw random bits follow a non-uniform distribution; consequently, we cannot obtain $log_{2}(N)$ bits of random numbers. To extract perfectly random numbers, we require a postprocessing procedure (i.e. randomness extractor).

In the coherent detection based QRNG, the quantum randomness is inevitably mixed with classical noises introduced by the detector and other system imperfections. Moreover, any measurement system has a finite bandwidth, implying unavoidable correlations between adjacent samples. Once quantified, these unwanted side-effects can be eliminated through an appropriate randomness extractor [92].

The composable extractor was first introduced in classical cryptography [155, 156], and was later extended to quantum cryptography [157, 158]. To generate information-theoretically provable random numbers, two typical extractor, the Trevisan’s extractor or the Toeplitz-hashing extractor, are generally employed in practice.

Trevisan’s extractor [159, 160] has been proven secure against quantum adversaries [161]. Moreover, it is a strong extractor (its seed can be reused) and its seed length is polylogarithmic function of the input. Tevisan’s extractor comprises two main parts, a one-bit extractor and a combinatorial design. The Toeplitz-hashing extractor was well developed in the privacy amplification procedure of the QKD system [162]. This kind of extractor is also a strong extractor [163]. By applying the fast Fourier transformation technique, the runtime of the Toeplitz-hashing extractor can be improved to $O(n\log n)$ .

On account of their strong extractor property, both of these extractors generate random numbers even when the random seed is longer than the output length of each run. Both extractors have been implemented [92] and the speed of both extractors have been increased in follow-up studies [164, 165], but remain far below the operating speed of the QRNG based on laser-phase fluctuation (68 Gbps [113]). Therefore, the speed of the extractor is the main limitation of a practical QRNG.

Part II Quantumness and randomness

Chapter 4 Coherence and randomness

This chapter introduces the basic quantification and witness methods for quantum coherence. We relate coherence measures with the quantum randomness measured on the computational basis. We refer to Ref. [21, 22] for references of this chapter.

4.1 Quantifying quantum randomness

4.1.1 Quantum randomness against quantum information

Let us consider a $d$ -dimensional Hilbert space and a reference basis $I:=\{\ket{i}\}=\left\{\ket{1},\ket{2},\dots,\ket{d}\right\}$ . Suppose a projective measurement $\{\ket{i}\bra{i}\}$ is performed on a given quantum state $\rho_{A}$ accessed by an experimentalist Alice. The measurement outcome has a probability distribution $\{p_{i}\},\sum_{i=1}^{d}p_{i}=1,,p_{i}=\text{Tr}[\rho\ket{i}\bra{i}]\geq 0,\forall i$ . In quantum information theory, a practical quantifier of the total randomness associated to the measurement is given by the Shannon entropy $H(\{p_{i}\})_{\rho}=-\sum_{i}p_{i}\log(p_{i})$ . However, the randomness of the measurement is intrinsically twofold: a classical uncertainty due to Alice’s ignorance about the system state; and a quantum one due to the coherence of the state in the reference basis. For a mixture of incoherent states $\rho_{{\cal I}}=\sum_{i}q_{i}\ket{i}\bra{i},$ the measurement randomness is given by the state mixedness, i.e., a classical source of uncertainty, quantified by the state von Neumann entropy: $H(\{p_{i}\})_{\rho_{{\cal I}}}=H(\{q_{i}\})=S(\rho_{{\cal I}})$ . On the other hand, for pure states, $\rho_{p}=\ket{\psi}\bra{\psi}$ , the randomness is due to the genuinely quantum overlap between the state and the basis elements: $H(\{p_{i}\})_{\rho_{p}}=H(\{|\langle i|\psi\rangle|^{2}\})$ . Here we present an operational characterization of the quantum randomness for arbitrary coherent mixed states. To be a good measure of quantum uncertainty, a quantity should satisfy the following properties:

Being nonnegative; 2. 2.

Vanishing if and only if the measurement uncertainty is only due to the state mixedness; 3. 3.

Representing the total uncertainty for pure states; 4. 4.

Being convex [49, 166, 167, 168, 14].

We consider the worse case scenario depicted in Fig.4.1, where Alice and Eve share a bipartite system in state $\rho_{AE}$ . Alice makes a measurement and obtains outcomes following a probability distribution $\{p_{i}\},p_{i}=\text{Tr}[\rho_{A}\ket{i}\bra{i}]$ . The total randomness associated to the measurement is $H(\{p_{i}\})$ . The uncertainty of Eve about Alice’s measurement outcome is quantified by the conditional entropy $H(\{p_{i}\}|E)_{\rho_{AE}}$ .

After the Alice’s measurement, define the global state by $\rho_{AE}^{\prime}$ and Alice’s resulted state becomes $\rho_{A}^{\mathrm{diag}}:=\sum_{i}p_{i}\ket{i}\bra{i}$ . Hence, in the best case scenario for Eve, her uncertainty is given by the von Neumann conditional entropy

[TABLE]

where the optimization runs over all the possible Eve’s states such that $\text{Tr}_{E}(\rho_{AE})=\rho_{A}$ , and the conditional entropy is given by $S(A|E)_{\rho_{AE}^{\prime}}=S(\rho_{AE}^{\prime})-S(\rho_{E})$ . It is not hard to see that the best case scenario for Eve is to hold a purification of Alice, $\ket{\psi}_{AE}$ . In fact, one can always extend Eve’s part to hold a purification of a mixed state $\rho_{AE}$ , which will not increase her uncertainty about Alice’s measurement outcome.

When $\rho_{A}$ is a pure state, then $\ket{\psi}_{AE}$ and hence $\rho_{AE}^{\prime}=\rho_{A}^{\mathrm{diag}}\otimes\rho_{E}$ are both product states. It is easy to verify that Eve’s uncertainty corresponds to the total randomness of Alice’s measurement:

[TABLE]

When $\rho_{A}$ is not a pure state, after Alice’s measurement, the state is changed to $\rho_{AE}^{\prime}=\sum_{i}p_{i}\ket{i}_{A}\bra{i}\otimes\rho_{i}^{E}$ , where $\rho_{E}=\sum_{i}p_{i}\rho_{i}^{E}$ . In fact, $\rho_{i}^{E}={}_{A}\bra{i}(\ket{\psi}_{AE}\bra{\psi}_{AE})\ket{i}_{A}/p_{i}$ is a pure state. The conditional entropy of the post measurement state is given by $S(A|E)_{\rho_{AE}^{\prime}}=S(\rho_{AE}^{\prime})-S(\rho_{E})$ . Using the equality $S\left(\sum_{i}p_{i}\ket{i}\bra{i}\otimes\rho_{i}\right)=H(p_{i})+\sum_{i}p_{i}S(\rho_{i})$ , the conditional entropy is then $S(A|E)_{\rho_{AE}^{\prime}}=H(p_{i})+\sum_{i}p_{i}S(\rho_{i}^{E})-S(\rho_{E})$ . Since $H(p_{i})=S(\rho_{A}^{\mathrm{diag}})$ , $S(\rho_{E})=S(\rho_{A})$ , and $S(\rho_{i}^{E})=0,\forall i,$ we have

[TABLE]

It is immediate to observe that the Eve’s uncertainty is equal to the relative entropy of coherence

[TABLE]

thus satisfying all the requirements for a consistent measure of quantum randomness as well as being a measure of BCP coherence [14].

Note that, when considering a tripartite pure state $\ket{\psi_{ABE}}$ and a projective measurement $\{\ket{i}\bra{i}\}$ on system $A$ , it is shown [169] that the quantum randomness of the measurement outcome conditioned on system $E$ corresponds the distance between state $\rho_{AB}=\mathrm{tr}_{E}[\ket{\psi_{ABE}}\bra{\psi_{ABE}}]$ and state $\rho_{AB}^{\prime}$ after the measurement. Furthermore, by regarding system $B$ as a trivial system, the analysis in Ref. [169] also applies to our scenario.

4.1.2 Quantum randomness against classical information

In the last part, we showed that the quantum randomness of a local measurement can be quantified by the best case uncertainty of a correlated party Eve. Such uncertainty has been quantified by the quantum conditional entropy. We compare the result with an alternative measure of quantum randomness reported in Ref. [21]. The setting is for the sake of clarity depicted in Fig. 4.2. The difference is that Eve performs a measurement with probability distribution $\{q^{E}_{i}\},q^{E}_{i}=\text{Tr}[\rho_{E}\ket{e_{i}^{\prime}}_{E}\bra{e_{i}^{\prime}}]$ on her own system to predict Alice’s measurement outcome. The best case uncertainty is then given by the classical conditional entropy:

[TABLE]

where the minimization runs over all the possible Eve’s states and measurements. When Alice’s system is in a pure state $\ket{\psi}_{A}=\sum_{i}\sqrt{p_{i}}\ket{i}$ , the probability distributions of $A$ and $E$ are uncorrelated as the the global system is in a tensor product state. Hence, we have $R^{C}_{I}(\rho_{A})=H(\{p_{i}\}|\{q^{E}_{i}\})_{\psi_{AE}}=H(\{p_{i}\})$ for any Eve’s strategy. The quantity corresponds to the total randomness as expected. For an arbitrary mixed state $\rho_{A}$ , it turns out that the Eve’s uncertainty on Alice’s measurement is given by

[TABLE]

where the minimization is over all possible decompositions of $\rho_{A}$ . We briefly review the proof here. Given the spectral decomposition $\rho_{A}=\sum_{i}\lambda_{i}\ket{a_{i}}\bra{a_{i}}$ , then a purification of $\rho_{A}$ is $\ket{\psi}_{AE}=\sum_{i}\sqrt{\lambda_{i}}\ket{a_{i}}_{A}\otimes\ket{e_{i}}_{E}$ . Here $\{\ket{e_{i}}_{E}\}$ is an orthogonal basis of Eve’s system. Eve performs a projective measurement $\{\ket{e^{\prime}_{i}}_{E}\}$ on her local system, then based on her measurement outcome $\ket{e^{\prime}_{i}}_{E}$ , the Alice’s state is

[TABLE]

where $p_{i}=\sum_{j}{\lambda_{j}}\left|\langle e^{\prime}_{i}\ket{e_{j}}\right|^{2}$ . As the state of Alice is pure for each outcome of Eve, the averaged quantum randomness is $\sum p_{i}R^{C}_{I}\left(\ket{\psi_{i}}_{A}\right)$ . On the other hand, Eve can choose an arbitrary measurement basis, which determines a decomposition of $\rho_{A}$ , to maximize his prediction success probability. Therefore, the quantum randomness measure should be optimized over all the possible decompositions of $\rho_{A}$ . When Eve performs a general measurement (POVM), we can always enlarge the system of Eve and consider a projective measurement, then the proof follows accordingly q.e.d.

The quantum randomness measure obtained by convex roof extension of the pure state randomness is a measure of BCP coherence as well [21].

Verifying the properties of $R_{z}^{C}$ .

Now we show that the intrinsic randomness $R_{Z}^{C}$ , defined in Eq. (4.6), satisfies the properties of coherence measures listed in Table. 3.1. That is, the requirements of the measures for quantum coherence and intrinsic randomness are equivalent.

Proof of (C1)

In the language of generating randomness, the requirement (C1) in Table. 3.1 can be interpreted as that classical states generate no randomness. This is because that an incoherent state $\delta$ , defined in Eq. (3.1), can be understood as a statistical mixture of classical states. We can easily verify that $R_{Z}^{C}(\delta)=0$ , since $R_{Z}^{C}(\delta)\leq\sum_{i=1}^{d}p_{i}R_{Z}^{C}(\ket{i}\bra{i})=0$ from Eq. (3.1) and $R_{Z}^{C}(\rho)\geq 0$ by definition. The stronger requirement (C1’) implies that any non-classical states, which cannot be represented in the form of Eq. (3.1), could always be used to generate intrinsic randomness. Thus, this result answers why nonzero intrinsic randomness always indicates ‘quantumness’ as discussed above. To prove that $R_{Z}^{C}(\rho)$ satisfies (C1’), consider a state $\rho\notin\mathcal{I}$ that has $R_{Z}^{C}\left(\rho\right)=0$ . From the definition of $R_{Z}^{C}$ , there exists a decomposition $\rho=\sum_{e}p_{e}\ket{\psi_{e}}\bra{\psi_{e}}$ such that $R_{Z}^{C}(\ket{\psi_{e}}\bra{\psi_{e}})=0$ for all $e$ . As any pure state with zero randomness is in the basis $I$ , we have $\ket{\psi_{e}}=\ket{i_{e}}\in I$ , and $\rho=\sum_{e}p_{e}\ket{i_{e}}\bra{i_{e}}$ , which belongs to the set $\mathcal{I}$ , which is a contradiction. We can also show that the upper bound of its intrinsic randomness is given by $R_{Z}^{C}\left(\rho\right)\leq\log_{2}d$ . The maximally coherent state $\ket{\Psi_{d}}$ , defined in Eq. (3.2), has the largest intrinsic randomness.

Proof of (C2)

The requirement (C2) implies a monotonicity property of incoherent operations. In the corresponding randomness picture, incoherent operations can be understood as classical operations that map one zero intrinsic randomness (classical) state to another one. An interpretation of (C2a) is that such classical operations should not increase the randomness of a given state. While (C2b) requires that the randomness cannot increase on average when probabilistic strategies are considered. Let us quickly check why (C2b) is true for the pure state case. For a pure state $\rho$ , the randomness measure $R_{Z}^{C}(\rho)$ equals the relative entropy of coherence $C_{\mathrm{rel,ent}}(\rho)$ , whose monotonicity has been proved [14]. That is, we have

[TABLE]

where $\ket{\psi_{n}}=K_{n}\ket{\psi}/\sqrt{p_{n}}$ , and $p_{n}=\mathrm{Tr}\left[K_{n}\ket{\psi}\bra{\psi}\right]$ . This is because for a pure state $\rho$ , the intrinsic randomness $R_{Z}^{C}(\rho)$ equals the relative entropy coherence measure $C_{\mathrm{rel,ent}}(\rho)$ [14], whose monotonicity has already been proved.

For a general mixed state $\rho$ , suppose that the optimal decomposition that achieves the minimum in Eq. (4.6) is given by $\rho=\sum_{e}p_{e}\ket{\psi_{e}}\bra{\psi_{e}}$ . Then, we have

[TABLE]

Now suppose that the incoherent operation defined in the main text is acted on $\rho$ . What we need to prove is that

[TABLE]

where $\rho_{n}={K_{n}\rho K_{n}^{\dagger}}/{p_{n}}$ and $p_{n}=\mathrm{Tr}\left[K_{n}\rho K_{n}^{\dagger}\right]$ . As $\rho=\sum_{e}p_{e}\ket{\psi_{e}}\bra{\psi_{e}}$ , we have

[TABLE]

where, we denote $p_{en}=\mathrm{Tr}[K_{n}\ket{\psi_{e}}\bra{\psi_{e}}K_{n}^{\dagger}]$ , $\rho_{en}={K_{n}\ket{\psi_{e}}\bra{\psi_{e}}K_{n}^{\dagger}}/{p_{en}}$ , and we have $p_{n}=\sum_{e}p_{e}p_{en}$ . Then, we can finish the proof

[TABLE]

where the first inequality is based on the conclusion for pure states in Eq. (4.8) and the last inequality is due to the convexity of $R_{Z}^{C}$ .

Proof of (C3)

The convexity property (C3) can be understood as a requirement on the randomness generation process. In other words, the randomness cannot increase on average by statistically mixing several states. With the convex roof definition of $R_{Z}^{C}(\rho)$ , given in Eq. (4.6), we can easily verify the convexity property (C3). The proof follows directly by considering a specific decomposition of $\rho=\sum_{n}p_{n}\rho_{n}$ in (C3). Note that, the property (C2a) can be derived when (C2b) and (C3) are fulfilled, thus we also prove (C2a) for $R_{Z}^{C}(\rho)$ .

In summary, we prove that the intrinsic randomness $R_{Z}^{C}(\rho)$ indeed measures the strength of coherence. A state with stronger coherence would therefore indicate larger randomness in measurement outcomes, and vice versa.

4.1.3 Qubit example

Here, we derive the intrinsic randomness formula of qubit state. We denote the Pauli matrices by $\sigma_{i},\sigma_{x},\sigma_{y},\sigma_{z}$ . When measured in the $\sigma_{z}$ basis, the intrinsic randomness for pure a qubit state $\ket{\psi}=\alpha\ket{0}+\beta\ket{1}$ is given by

[TABLE]

where $H(p)=p\log p+(1-p)\log(1-p)$ . If we define $n_{x}=\bra{\psi}\sigma_{x}\ket{\psi}=\alpha^{*}\beta+\alpha\beta^{*}$ and $n_{y}=\bra{\psi}\sigma_{y}\ket{\psi}=-i\alpha^{*}\beta+i\alpha\beta^{*}$ , then it is easy to check that

[TABLE]

For a general mixed state $\rho$ , we can follow the method for deriving the entanglement of formation [170]. In this case, we need to first define $\ket{\tilde{\psi}}=\sigma_{x}\ket{\psi^{*}}=\beta^{*}\ket{0}+\alpha^{*}\ket{1}$ , and the coherent concurrence by

[TABLE]

Then it is easy to check that

[TABLE]

The randomness $R_{I}^{C}\left(\rho\right)$ can be obtained according to Eq. (4.16) by first calculating the coherent concurrence. Follow the method of deriving the entanglement of formation, the $C_{z}$ value can be obtained by $C_{z}=|\sqrt{\eta_{1}}-\sqrt{\eta_{2}}|$ , where $\eta_{1}$ and $\eta_{2}$ are the eigenvalues of the matrix $M=\rho\sigma_{x}\rho^{*}\sigma_{x}$ . In the Bloch sphere representation, the value of $C_{z}$ of a quantum state $\rho=(\sigma_{i}+n_{x}\sigma_{x}+n_{y}\sigma_{y}+n_{z}\sigma_{z})/2$ can be calculated by

[TABLE]

Compared to the $l_{1}$ norm coherence measure $C_{l_{1}}$ [14], which is defined by the sum of the off-diagonal elements

[TABLE]

one can easily check that $C_{l_{1}}(\rho)$ equals the concurrence $C_{z}$ for the qubit case. This is because

[TABLE]

We conjecture that the coherence concurrence can be generalized to an arbitrary high dimensional space by following a similar method to that used for the entanglement concurrence [171, 172, 173].

4.1.4 Comparison between the two randomness measures

Let us compare the two quantities $R^{C}_{I}(\rho_{A}),R^{Q}_{I}(\rho_{A})$ in a simple example about a qubit system. In the Bloch sphere representation, $\rho_{A}=(I+\vec{n}\cdot\vec{\sigma})/2$ , where $\vec{n}=(n_{x},n_{y},n_{z})$ and $\vec{\sigma}=(\sigma_{x},\sigma_{y},\sigma_{z})$ are the Pauli matrices. Supposing that the measurement basis is the $\sigma_{Z}$ eigenbasis, which is denoted by $\{\ket{0},\ket{1}\}$ , then we obtain

[TABLE]

where $|n|=\sqrt{n_{x}^{2}+n_{y}^{2}+n_{z}^{2}}$ and $H$ is the binary entropy. Specifically, for the state $\rho_{A}(v)=v\ket{+}\bra{+}+\frac{1-v}{2}\mathbb{I}$ , where $\ket{+}=(\ket{0}+\ket{1})/2,v\in[0,1],\vec{n}(v)=(v,0,0)$ , we have

[TABLE]

In Fig. 4.3, we plot the quantum randomness versus the mixing parameter $v$ . As expected, the quantum randomness measure $R_{z}^{Q}$ obtained through the a fully quantum picture is smaller than $R_{z}^{C}$ , which is derived by the measurement-based method, while they both vanish when the state is incoherent, and are equal to the Shannon entropy in the pure state case.

4.2 Coherence or randomness distillation

When Alice performs a projective measurement $P_{I}$ on $N$ identical pure states $\ket{\psi}=\sum_{i}a_{i}\ket{i}$ , she will obtain $N$ i.i.d. random variables $A_{1},A_{2},\dots,A_{N}$ . For the state $\ket{\psi}$ that is not maximally coherent, the randomness of the measurement outcomes is biased. Then, as shown in Fig. 4.4(a), Alice can perform a randomness extraction process to transform the $N$ biased random numbers to $l\approx NR_{Z}^{C}(\ket{\psi})$ almost uniformly distributed random bits.

We show in Fig. 4.4(b) that the extraction can be equivalently performed before measurement. Now, the extraction becomes a quantum procedure, which we call quantum extraction. Considering the equivalence between intrinsic randomness and quantum coherence, quantum extraction can be regarded as a procedure of coherence distillation. This concept resembles the distillation procedure of another (more popular) quantumness measure—entanglement [15].

With quantum extraction, we can first distil the input state $\ket{\psi}=\sum_{i}a_{i}\ket{i}$ into the maximally coherent state $\ket{\Psi_{2}}=(\ket{0}+\ket{1})/\sqrt{2}$ . Then, we can directly obtain uniformly distributed random bits by measuring the maximally coherent state. For $N$ copies of $\ket{\psi}$ , it is shown in Supplementary Materials that we can asymptotically obtain $l$ copies of $\ket{\Psi_{2}}$ , where $l$ and $N$ satisfy the following condition,

[TABLE]

Taking a pure qubit input state as an example, the distillation procedure is summarized as follows.

Prepare $N$ copies of qubit state $\ket{\psi}^{\otimes N}=\left(\alpha\ket{0}+\beta\ket{1}\right)^{\otimes N}$ , which can be binomially expanded on the computational basis. There are $N+1$ distinct coefficients, $\beta^{N},\alpha^{1}\beta^{N-1},\dots,\alpha^{N}$ , corresponding to different subspaces that have the same number of $\ket{0}$ or $\ket{1}$ . 2. 2.

Perform a projection measurement to distinguish between those subspaces. For the $k$ th subspace, which has coefficient $\alpha^{N-k}\beta^{k}$ , the measurement probability is given by $p_{k}={N\choose k}|\alpha|^{2(N-k)}|\beta|^{2k}$ . The resulting quantum state of the $k$ th outcome corresponds to a maximally coherent state $\ket{\Psi_{D_{k}}}$ of dimension $D_{k}={N\choose k}$ . 3. 3.

Suppose that $2^{r}\leq D_{k}<2^{r+1}$ , then we can directly project onto the $2^{r}$ subspace and convert to $r$ copies of $\ket{\Psi_{2}}$ as desired.

To see why $r/N$ equals the randomness of $\ket{\psi}$ on average, we only need to take account of the operations that cause a loss of coherence. As shown in Supplementary Materials, the only two projection measurements lose negligible amount of coherence, thus we asymptotically have $NR_{Z}^{C}(\ket{\psi})\approx r$ .

In Appendix A, we further extend the definition of distillable coherence to mixed quantum states. Compared to the definition of the regulated entanglement of formation [174], we also define coherence of formation and conjecture that it equals the regulated intrinsic randomness measure,

[TABLE]

4.2.1 Comparison with entanglement

As shown in Table 4.1, there exist strong similarities between the frameworks of coherence and entanglement (see also Ref. [175]), our study can be regarded as an extension of the convex roof measure from entanglement to coherence. Similar to the case of EOF, as a convex roof measure for coherence, we expect our proposed measure to play an important role in the research of coherence.

For further research directions, it is interesting to extend the framework of entanglement to coherence. An incomplete list of comparison between the two are shown in Table 4.1. For instance, it is interesting to see whether $C_{\mathrm{rel,ent}}(\rho)$ and $R_{Z}^{C}(\rho)$ are the unique lower and upper bounds of all coherence measures after regularization, and whether they can coincide. Another interesting and related question is that of quantifying the coherence for an unknown quantum state, similar to the task of using an entanglement witness for quantification. The coherence measure $R_{I}(\rho)$ given in Eq. (4.6) ensures the true randomness when measuring a state $\rho$ in the $I$ basis. Such a technique can be utilized to construct a semi self-testing quantum random number generator. A straightforward way to do this is to first perform tomography on the to-be-measured state $\rho$ and then estimate the randomness of the $I$ basis measurement outcomes according to Eq. (4.6). As the coherence measure $R_{I}(\rho)$ quantifies the output randomness in a measurement, our result can also be applied in other randomness generation scenarios [187, 103, 110, 64].

The definition of coherence and intrinsic randomness is based on a specific computational basis. In this perspective, the quantum feature can be quantified by the superposition strength on the measurement basis. Alternatively, we can define similar quantumness as the ability of measurements. For an arbitrary pure quantum state, if we can choose the measurement basis that is complementary to the state, quantum feature similar to coherence can also be maximally revealed. The definitions of coherence based on the property of quantum state with a given measurement basis and the property of measurement is similar to the relationship between the pictures of Schrodinger and Heisenberg. The current definition of coherence thus follows from the routine of the Schrodinger¡¯s picture.

We also investigate intrinsic randomness without specifying a measurement basis. In this case, we consider the scenario that an optimal measurement basis is chosen to maximize the output randomness. Here, we do not assume that the choice of measurement basis is secret from Eve¡¯s point of view. Thus, we still have to consider the minimization of intrinsic randomness on the chosen measurement basis. In this case, this basis-independent intrinsic randomness can be defined as $R(\rho)=\max_{I}R_{Z}^{C}\left(\rho\right)$ . In the qubit example, we show that the basis-independent intrinsic randomness is related to the purity of a quantum state. We thus demonstrate that intrinsic randomness can be used to quantify other quantum features.

4.3 Basis independent randomness and coherence

Here, we quantify the intrinsic randomness under a different scenario. When the measurement basis has not been specified, we can still consider the intrinsic randomness. In this case, Alice can choose an optimal measurement basis to maximize the output randomness. Here, we do not assume that the choice of measurement basis is secret from Eve¡¯s point of view. Thus, we still have to consider the minimization of intrinsic randomness on the chosen measurement basis. In this case, this basis-independent intrinsic randomness can be defined as

[TABLE]

Here, the maximization is over all possible projective measurement basis $I$ and the randomness measure could be either $R_{I}^{C}$ or $R_{I}^{Q}$ . We do not consider general POVM measurement for Alice as for it generally requires ancillary quantum states which might introduce coherence and hence randomness. The new definition $R(\rho)$ still represents the existence of quantum effects. That is, nonzero $R(\rho)$ will always indicate the existence of quantumness, although $R(\rho)$ does not quantify the coherence of quantum states in this instance, since the coherence is defined on a specific basis.

As an example, we consider the basis independent randomness with randomness measure $R_{I}^{C}$ for qubit state $\rho$ . In this example, we also give a direct expression of $R(\rho)$ for a qubit state $\rho$ . As the randomness measure $R_{I}^{C}$ is a function of $C_{I}$ according to Eq. (4.16), we can thus similarly define $C=\max_{I}C_{I}$ . In the following, we will focus on calculating $C$ and the basis independent randomness $R(\rho)$ is a direct function of it.

An arbitrary measurement basis $I$ can be considered as a unitary transformation of the measurement basis on the original $\delta_{z}$ basis. Equivalently, we can suppose that the measurement basis is unchanged, while a unitary transformation acts upon the to-be-measured quantum state. Suppose that the original state $\rho$ has a spectral decomposition given by

[TABLE]

Thus, a unitary transformation on the state $\rho$ would only transform the eigenstates $\ket{\Psi_{1}}$ and $\ket{\Psi_{2}}$ to another basis and leave the eigenvalues $\lambda_{1}$ and $\lambda_{2}$ unchanged. In this case, we can still work on the $\delta_{z}$ basis, and obtain

[TABLE]

Now, the maximization over all possible measurement basis is equivalently achieved over all possible eigenstates $\ket{\Psi_{1}}$ and $\ket{\Psi_{2}}$ with given eigenvalues $\lambda_{1}$ and $\lambda_{2}$ .

We assume that $\left|\Psi_{1}\right\rangle=a\left|0\right\rangle+b\left|1\right\rangle$ and $\left|\Psi_{2}\right\rangle=b^{*}\left|0\right\rangle-a^{*}\left|1\right\rangle$ , where the normalized coefficients $a,b$ are complex numbers yielding $a^{*}a+b^{*}b=1$ and $\left|0\right\rangle$ and $\left|1\right\rangle$ are eigenstates of $\delta_{z}$ . Then a general density matrix $\rho$ can be represented by

[TABLE]

To get $C$ , we need to calculate $M$ , which can be written as

[TABLE]

By denoting

[TABLE]

The matrix $M$ can be denoted by

[TABLE]

And $\eta_{1}$ and $\eta_{2}$ will be the two roots of the equation

[TABLE]

Now we can calculate the square of $C(\rho)$ by

[TABLE]

Here we notice that $CD=4AB$ , hence

[TABLE]

where $A-B=-\lambda_{1}\lambda_{2}(\left|a\right|^{2}+\left|b\right|^{2})^{2}\leq 0$ , thus

[TABLE]

Notice that, the equal sign is taken for a measurement basis complementary to the eigenbasis of $\rho$ . For $\rho=(I+n_{x}\delta_{x}+n_{y}\delta_{y}+n_{z}\delta_{z})/2$ , we have that $\lambda_{1}=(1+n)/2$ and $\lambda_{2}=(1-n)/2$ , where $n=\sqrt{n_{x}^{2}+n_{y}^{2}+n_{z}^{2}}$ . Therefore, we have

[TABLE]

Thus this value can be physically related to the degree of how mixed the state $\rho$ is.

Chapter 5 Quantum Bernoulli Factory

Bernoulli factory [23, 24] is a simple yet interesting task in classical randomness processing. This chapter discusses quantum Bernoulli factory [188] and show its fundamental difference from classical Bernoulli factory. The key difference lie on the coherent superposition of states. To demonstrate this difference, we also present a theoretical protocol and an experiment verification with superconducting qubits [189].

5.1 Theoretical protocol

Coherent superposition of different states, coherence, is a peculiar feature of quantum mechanics that distinguishes itself from Newtonian theory. In different scenarios, coherence exhibits as various quantum resources, such as entanglement [17], discord [190], and single-party coherence [14]. In many quantum information tasks, the common resource leading to quantum advantage is multipartite quantum correlations. For instance, entanglement plays a crucial role in quantum key distribution [6, 8], teleportation [191], and computation [9, 10]. While the essence of multipartite correlation originates from coherent superposition, it is natural to expect the essence of quantum advantage to also originate from coherence. This raises a fundamental question: Can quantum advantage be obtained without using multipartite correlations?

In randomness generation, it has been shown that coherence is the essential resource for generating true random numbers [21]. It is thus natural to expect coherence to be a resource for displaying quantum advantages in certain randomness related tasks. Remarkably, in a recent work by Dale et al., a rather simple task of randomness processing is proposed to show that coherence yields a provable quantum advantage over classical stochastic physics [188]. In this randomness processing task, a classical coin, see Fig. 5.1(a), corresponds to a classical machine that produces independent and identically distributed random variables where each one has the binary values, head (0) and tail (1). A coin is called $p$ -coin if the probability of producing a head is $p$ , where $p\in[0,1]$ . Given an unknown $p$ -coin, an interesting question is whether one can construct an $f(p)$ -coin, where $f(p)$ is a function of $p$ and $f(p)\in[0,1]$ . Such construction processing is called a Bernoulli factory [23, 24].

Let us take $f(p)=1/2$ for example, which was solved by von Neumann with a rather simple but heuristic strategy [192]. Flip the $p$ -coin ( $p\neq 0$ ) twice. If the outcomes are the same start over; otherwise, output the second coin value as the $1/2$ -coin output. Therefore, the function of $f(p)=1/2$ can be constructed from an arbitrary unknown $p$ -coin. As a generalization, a natural question involves which kind of function $f(p)$ can be constructed from an unknown $p$ -coin. This classical Bernoulli factory problem was solved by Keane and O’Brien [193]. Generally speaking, a necessary condition for $f(p)$ being constructible is that $f(p)\neq 0$ or $1$ when $p\in(0,1)$ . The function $f(p)=1/2$ satisfies this condition, while there are many other examples that violate it. For instance, surprisingly, the simple “probability amplification” function $f(p)=2p$ 111A complete definition is: $f(p)=2p$ when $p\in[0,1/2]$ and $f(p)=2(1-p)$ when $p\in(1/2,1]$ . does not satisfy the constructible condition, where we have $f(1/2)=1$ . Therefore, there is no classical method to construct an $f(p)=2p$ -coin.

In the language of quantum mechanics, a $p$ -coin corresponds to a machine that outputs identically mixed qubit states,

[TABLE]

where $p\in[0,1]$ , and $Z=\{\ket{0},\ket{1}\}$ is the computational basis denoting head and tail, respectively. As $p$ is generally unknown, we can regard $\rho_{C}$ as a classical way of encoding an unknown parameter $p$ . A measurement in the $\{\ket{0},\ket{1}\}$ basis would output a head or a tail with a probability according to $p$ and $1-p$ , respectively. On the other hand, a quantum way of encoding $p$ , see Fig. 5.1(b), can be a coherent superposition of $\ket{0}$ and $\ket{1}$ , i.e., $\rho_{Q}=\ket{p}\bra{p}$ with

[TABLE]

Following the nomenclature in Ref. [188], we call such a quantum coin a quoin. It is straightforward to see that a $p$ -coin can always be constructed from a $p$ -quoin by measuring it in the $Z$ (computational) basis. Thus, classically constructible (via coins) $f(p)$ functions are also quantum mechanically constructible (via quoins), while a really interesting question is whether the set of quantum constructible functions (via quantum a Bernoulli factory) is strictly larger than the classical set.

In Ref. [188], Dale et al. have theoretically proved the necessary and sufficient conditions for $f(p)$ being quantum constructible. Specifically, they show that there are functions, for instance $f(p)=2p$ , which are impossible to construct classically, but can be efficiently realized in the presence of $p$ -quoins. Therefore, they provide a positive answer to this problem where quantum resources are strictly superior to classical ones. The protocol for generating the $f(p)=2p$ function relies on Bell state measurement on two quoins, which essentially establish entanglement between the two quoins.

Now, we are interested in seeing whether such a quantum advantage persists even when multipartite correlations, such as entanglement, are absent. Thus, we only allow single qubit operations. Without two-qubit operations, it turns out that constructing the $f(p)=2p$ function will require many copies of qubits defined in Eq. (5.2) and the convergence could be poor. In this chapter, we propose another function that is impossible with classical means but feasible with only limited number of single-qubit operations.

The protocol for quantum Bernoulli factory

Here, we analyze a classically impossible $f(p)$ function defined by

[TABLE]

For $p=1/2$ , we have $f(p)=1$ which means that this function is classically unachievable. On the other hand, it is straightforward to check that the $f(p)$ -function satisfies the requirements for beign quantum constructible [188]. Given a $p$ -quoin, we explicitly present an efficient protocol for generating an $f(p)$ -coin as follows.

Step 1

Generate a $p$ -coin:

When measuring a $p$ -quoin, as given by Eq. (5.2), in the $Z$ basis, the probabilities of obtaining [math] and $1$ are $p$ and $1-p$ , respectively. 2. Step 2

Generate a $q$ -coin, where $q=\left[1+2\sqrt{p(1-p)}\right]/2$ :

When measuring a $p$ -quoin in the $X=\{(\ket{0}+\ket{1})/\sqrt{2},(\ket{0}-\ket{1})/\sqrt{2}\}$ basis, the probabilities of obtaining $(\ket{0}+\ket{1})/\sqrt{2}$ and $(\ket{0}-\ket{1})/\sqrt{2}$ are $\left[1+2\sqrt{p(1-p)}\right]/2$ and $\left[1-2\sqrt{p(1-p)}\right]/2$ , respectively. 3. Step 3

Construct an $m$ -coin from a $p$ -coin, where $m=2p(1-p)$ : toss the $p$ -coin twice, output head if the two tosses are different and tail otherwise.

The probability of output two different tossing result is

[TABLE]

Similarly, one can construct an $n$ -coin from a $q$ -coin, where $n=2q(1-q)=1/2-2p(1-p)$ . 4. Step 4

Construct an $s$ -coin from an $m$ -coin, where $s=m/(m+1)$ : toss the $m$ -coin twice, if the first toss is tail then output tail; otherwise if the second toss is tail, output head; otherwise, repeat this step.

Denote the probability of outputting head and tail by $\mathrm{P(H)}$ and $\mathrm{P(T)}$ , respectively, then,

[TABLE]

Solving this equation, we have

[TABLE]

Similarly, one can construct a $t$ -coin from an $n$ -coin, where $t=n/(n+1)$ . 5. Step 5

Construct an $f(p)=4p(1-p)$ -coin: first toss the $s$ -coin and then the $t$ -coin. If the first toss is head and the second toss is tail, then output head; if the first toss is tail and the second toss is head, then output tail; otherwise repeat this step.

Denote the probability of outputting head and tail by $\mathrm{P(H)}$ and $\mathrm{P(T)}$ , respectively, then,

[TABLE]

Solving this equation, we have

[TABLE]

In our protocol, generating the $q$ -coin, where

[TABLE]

is an essential nonclassical step. In fact, the only additionally required coin for constructing all quantum constructible $f(p)$ -coins is the $h_{a}(p)$ -coin,

[TABLE]

which can be obtained by measuring the quoin in the $\{\sqrt{1-a}\ket{0}+\sqrt{a}\ket{1},\sqrt{a}\ket{0}-\sqrt{1-a}\ket{1}\}$ basis. In our case, we set $a=1/2$ . Here, one can see that entanglement is not necessary to quantum Bernoulli factory.

In the protocol, the first two steps involve quantum devices, where quoins are measured in the $Z$ and $X=\{(\ket{0}+\ket{1})/\sqrt{2},(\ket{0}-\ket{1})/\sqrt{2}\}$ bases, respectively, to obtain the $p$ - and $q$ -coins. The following steps (step 3-5) are classical processing of the $p$ - and $q$ -coins. The rigourous derivation of the classical steps can be found in the Appendix. Comparing to the $f(p)=2p$ function, our protocol converges much faster, which results in a higher fidelity for the realization.

In practice, owing to experimental imperfections, we cannot realize perfect $p$ -quoins and perform ideal measurements to get perfect $p$ - and $q$ -coins. Thus, in reality, we cannot realize exact $f(p)$ -coins, especially, we cannot get $f(p)=1$ when $p=1/2$ . Following previous studies [194, 195, 196], we employ a truncated function

[TABLE]

with $\epsilon$ describing the imperfections. When $\epsilon$ is nonzero, the truncated function of $f=4p(1-p)$ falls in the classical Bernoulli factory and hence can be constructed via $p$ -coins. However, the number of classical coins $N$ required to construct $f(p)$ scales poorly with $\epsilon$ , see Appendix for more details. In the experiment, we need to implement high fidelity state preparation and measurement to reduce $\epsilon$ as small as possible in order to faithfully demonstrate the quantum advantage.

In the following, we focus on the preparation and measurement of the $p$ -quoin, and how to construct an $f(p)=4p(1-p)$ coin via necessary classical processing. Here, we emphasize that the quantum circuit to realize the operations should be independent of $p$ . In demonstration, we fix the measurement setting and prepare $p$ -quoins for various $p$ values.

5.2 Experimental realization

We choose a superconducting qubit system to prepare $p$ -quoins. Superconducting quantum systems have made tremendous progress in the last decade, including realizing long coherence times, showing great stability with fast and precise qubit manipulations, and demonstrating high fidelity quantum non-demolition (QND) qubit measurement. Thus, it makes a perfect candidate for our test.

5.2.1 Experiment setup

In our experiment, we employ the so-called ‘circuit quantum electrodynamics architecture’ [197]. A superconducting transmon qubit (our quoin) is located in a waveguide trench and dispersively couples to two 3D cavities [198, 199, 200] as shown in Fig. 5.2. The transmon qubit has a transition frequency of $\omega_{q}/2\pi=5.577$ GHz, an anharmonicity $\alpha_{q}/2\pi=-246$ MHz, an energy relaxation time $T_{1}=9~{}\mu$ s, and a Ramsey time $T_{2}^{*}=7~{}\mu$ s. The larger cavity has a resonant frequency of $\omega_{c}/2\pi=7.292$ GHz and a decay rate of $\kappa/2\pi=3.62$ MHz, which provides a fast way of reading out the qubit state through their strong dispersive interaction with a dispersive shift $\chi/2\pi=-4.71$ MHz. As we focus on exhibiting quantum advantage solely with a single quantum system, the smaller cavity with a higher resonant frequency is not used and remains in a vacuum state. This higher frequency cavity can potentially be used as another $p$ -quoin in future experiments [2]. In this case, joint measurement can be performed on two $p$ -quoins, which may save the resource. For now, we focus on single-qubit operations.

The output of the readout cavity is connected to a Josephson parametric amplifier (JPA) [201, 202], operating in a double-pumped mode [203, 204] as the first stage of amplification between the readout cavity, at a base temperature of 10 mK, and the high electron mobility transistor, at 4 K. To minimize pump leakage into the readout cavity and achieve a longer $T_{2}^{*}$ dephasing time, we operate the JPA in a pulsed mode. The readout pulse width has been optimized to 180 ns with a few photons in order to have a high signal-to-noise ratio. This JPA allows a high-fidelity single-shot readout of the qubit state. The overall readout fidelity of the qubit measured for the ground state $\ket{0}$ when initially prepared at $\ket{0}$ by a post-selection is 0.996, demonstrating the high QND nature of the readout, while the fidelity for the excited state $\ket{1}$ is slightly lower, 0.943 (see Appendix). The loss of both fidelities is predominantly limited due to the $T_{1}$ process during both the waiting time of the initialization measurement (300 ns) and the qubit readout time (180 ns).

Due to stray infrared photons and other background noise, our qubit has an excited state population of about $8.5\%$ in the steady state. The high QND qubit measurement allows us to eliminate these imperfections by performing an initialization measurement to purify the qubit by only selecting the ground state for the following experiments [205]. The measurement pulse sequences for preparing quoins can be found in the Appendix. It is worth mentioning that our superconducting system always yields a detection result once the measurement is performed, which is very challenging for other implementations, such as lossy photonic systems.

We apply an on-resonant microwave pulse to rotate the qubit to an arbitrary angle $\theta$ along the $Y$ -axis, $R_{\theta}^{Y}=\exp(-i\sigma_{y}\theta/2)$ , where $\sigma_{y}$ is the Pauli matrix, for a preparation of any $p=\cos^{2}(\theta/2)$ -quoins. We choose a gaussian envelope pulse truncated to $4\sigma=24$ ns for the rotation operations. We also use the so-called “derivative removal by adiabatic gate” [206] technique to minimize qubit leakage to higher levels outside the computational space. A randomized benchmark calibration [207, 208, 209, 210] shows that the $R_{\pi/2}^{Y}$ gate fidelity itself is about 0.998, mainly limited by the qubit decoherence. The final measurement for the quoins is along either the $Z$ -axis or the $X$ -axis. The measurement along the $X$ -axis is realized by applying an extra $R_{\pi/2}^{-Y}$ rotation (Hadamard transformation) followed by a $Z$ -basis measurement.

The readout property of the qubit is first characterized as shown in Fig. 5.3. The smaller cavity has a resonant frequency of $\omega_{s}/2\pi=8.229$ GHz and remains in vacuum all the time. Because we always purify our qubit initial state to the ground state $\ket{0}$ and use pulses with DRAG [206] to minimize the leakage to levels higher than the first excited state $\ket{1}$ , we do not distinguish the levels higher than $\ket{1}$ in the readout. We thus adjust the phase between the JPA readout signal and the pump such that $\ket{0}$ and $\ket{1}$ states can be distinguished with optimal contrast. Figure 5.3a shows the histogram of the qubit readout. The histogram is clearly bimodal and well-separated. A threshold $V_{th}=0$ is chosen to digitize the readout signal.

Due to stray infrared photons or other background noises, our qubit has an excited state population of about $8.5\%$ in the steady state (solid histogram in Fig. 5.3a). In order to eliminate these excited states for the quoin experiments, a high quantum non-demolition qubit measurement M1 is performed to allow a qubit purification by only selecting $\ket{0}$ state (see Fig. 5.4) [205]. We wait 300 ns for the readout photons to leak out before the preparation of the qubit to arbitrary superposition states through an on-resonant microwave pulse with various amplitudes. After a purification to $\ket{0}$ state, the following measurement gives a probability of 0.996 of $\ket{0}$ state (dashed histogram in Fig. 5.3a), demonstrating the high quantum non-demolition nature of the qubit measurement. Figure 5.3b shows the basic qubit readout properties. The readout fidelity of the qubit measured at $\ket{1}$ state while initially prepared at $\ket{1}$ state by a measurement is 0.943. The loss of both fidelities is predominantly limited due to the $T_{1}$ process during both the waiting time of the initialization measurement (300 ns) and the qubit readout time (180 ns).

5.2.2 Results

The experimental pulse sequences for the quoins with state preparations are shown in Fig. 5.4. The measurement is always performed in the $Z$ basis. The $X$ -basis measurement is realized by performing an extra $R_{\pi/2}^{-Y}$ rotation before the Z-basis measurement. The phase of this extra pre-rotation is chosen to minimize the effect from qubit decoherence during the measurement. In our experiment, the $q$ -coins as defined in Eq. (5.9) are implemented, which are also classically impossible 222The function $q(p)$ is defined in Eq. (5.9). It is straightforward to check to see that $q(1/2)=1$ , indicating that it is classically impossible when regarded as a function of $q$ . The $q$ -coin corresponds to the qubits of $\ket{\psi}=\cos(\theta/2)\ket{0}+(\sin\theta/2)\ket{1}$ . For different $\theta$ , our experiment results are listed in Table 5.1.

We also plot the experiment result of the $q$ -coins in Fig. 5.5(a) and the result of the $f(p)=4p(1-p)$ -coins by following the protocol in Fig. 5.5(b). The experimentally realized values of $q_{\mathrm{exp}}$ and $f_{\mathrm{exp}}(p)$ are sampled from the observed coins, which match well with the theoretical predictions. By implementing state preparation, operation and measurement with high fidelities, we are able to achieve $q_{\mathrm{exp}}(1/2)=0.990$ and $f_{\mathrm{exp}}(1/2)=0.965$ , which can be well modeled by the truncated function defined in Eq. (5.12) with $\epsilon=0.010$ and $\epsilon=0.035$ , respectively.

A randomized benchmarking experiment [207, 208, 209, 210] is performed to determine the fidelity of the $\pi/2$ gate around the $Y$ axis, $R_{\pi/2}^{Y}$ , which is the most critical gate for the quoin measurement. The randomized gates used in this experiment are chosen from the single-qubit Clifford group. This group contains 24 rotation gates which are composed from rotations around the $X$ and $Y$ axes using the generators: $\{I,+X,+Y,\pm X/2,\pm Y/2\}$ . The reference curve is measured after applying sequences of $m$ random Clifford gates, while the $Y/2$ curve is realized after applying sequences that interleave $R_{\pi/2}^{Y}$ with $m$ random Clifford gates. Each sequence is followed by a recovery Clifford gate in the end right before the final measurement. The number of random sequences of length $m$ in our experiment is chosen to be $k=100$ . Both curves are fitted to $F=Ap^{m}+B$ with different sequence decay $p$ . The reference decay indicates the average error of the single-qubit gates, while the ratio of the interleaved and reference decay gives the specific gate fidelity. The experiment results are displayed in Fig. 5.6. The data point is the average of the sequence fidelities of the $k=100$ sample sequences, and the error bar shows the standard deviation of the sample. Each random sequence is measured over 10,000 times to get the sequence fidelity whose error could be neglected. As a result, the average single-qubit gate error $r_{s}=r_{ref}/1.875=(1-p_{ref})/2/1.875=0.0014$ , and the $R_{\pi/2}^{Y}$ gate error $r_{Y/2}=(1-p_{int}/p_{ref})/2=0.0013$ . The dashed lines indicate a gate fidelity of 0.998 and 0.997 respectively. Therefore, the $R_{\pi/2}^{Y}$ gate fidelity in our experiment is greater than 0.998, and the uncertainty in the gate fidelity is typically 7e-5, determined by bootstrapping.

5.3 Simulation of Experiment data

5.3.1 The truncated function

Here, we show how to construct the truncated function

[TABLE]

from $p$ -coin by classical means. The protocol works as follows.

(i)

Toss the $p$ -coin twice, if the outputs are different, then output head otherwise output tail. This achieves $g(p)=2p(1-p)$ -coin with two $p$ -coins. 2. (ii)

Apply Theorem 1 in Ref. [194], which gives $h(p)=\min\{2p,1-2\epsilon_{1}^{\prime}\}$ , and perform the composition $h(g(p))=\min\{4p(1-p),1-2\epsilon_{1}^{\prime}\}$ . Let $\epsilon_{1}^{\prime}=0.0175$ , the desired function is obtained.

Now, we calculate the number of $p$ -coins needed in step (ii). By Theorem 1 in Ref. [194], the probability that more than $n$ $p$ -coins are needed is bounded by

[TABLE]

With large $n$ and small $\epsilon_{1}^{\prime}=0.0175$ , we can approximate Eq. (5.13) by

[TABLE]

This bound is nontrivial only if the right hand side is less and equal to 1, that is,

[TABLE]

Thus combining (i) and (ii), the number of $p$ -coins needed to simulate the $f_{t}(p)$ function is more than $2\times 1.9\times 10^{4}=3.8\times 10^{4}$ . Note that, Eq. (5.13) provides only an upper bound to the probability distribution, there may exists more efficient protocols that requires less number usages of $p$ -coins.

5.3.2 Simulation of the $q$ -coin

Here, we show how to simulate the truncated function of $q$ coin,

[TABLE]

with $\epsilon_{3}=0.01$ . To do so, we first construct the truncated coin $f_{t}(p)$ defined in Eq. (5.12). Then we can simulate $q_{t}(p)$ with the $f_{t}(p)$ -coin by applying the following protocol,

Apply a square root function of $f_{t}(p)$ , which gives a $\sqrt{f_{t}(p)}$ -coin. 2. 2.

Toss the 1/2-coin and the $\sqrt{f_{t}(p)}$ -coin, output tail if both tosses are tail.

Then, it is straightforward to check that the following coin is prepared

[TABLE]

which coincides with the $q_{t}(p)$ -coin if we let

[TABLE]

In this case, we have $\epsilon_{1}=0.04$ . To simulate the $f_{t}(p)$ -coin, we can follow the protocol in Sec. 5.3.1, which costs more than $4\times 10^{4}$ number of $p$ -coins on average for each $f_{t}(p)$ -coin. The square root function of $f_{t}(p)$ can be constructed by following the method from Ref. [195] or the one presented in Ref. [188]. On average, more than $10$ coins are needed for constructing the square root function. Therefore, more than $4\times 10^{5}$ number of $p$ -coins are necessary for the construction of the truncated function $q_{t}(p)$ .

The classical Bernoulli factory cannot produce exact $q$ - and $f(p)=4p(1-p)$ -coins with finite number of usages of $p$ -coins. In practice, the implemented function may deviate from the desired one due to device imperfections. In this case, the practically realized coins may be constructible with classical means, though the number of classical coins required may increase drastically with decreasing deviation. Focusing on the truncated function defined in Eq. (5.12), we present a classical protocol for simulating the experiment data $f_{\mathrm{exp}}(p)$ with $\epsilon=0.035$ . It is shown that, on average, more than $10^{4}$ classical $p$ -coins are required for constructing the truncated function, which is much larger than the average number of quoins (about $20$ ) used in our protocol 333Strictly speaking, the quantum advantage demonstrated here is a weaker version of the one mentioned in the beginning of the Letter, where the function is classically impossible to construct.. For the $q$ -coin, as the deviation is smaller, the classical simulation is even harder. In the Appendix, we show that more than $10^{5}$ classical coins are needed for the truncated function, while our quantum protocol only requires one quoin.

From the experimental perspective, the small deviation $f_{exp}(1/2)$ from unity in the ideal case is dominated by qubit decoherence. With better qubit coherence times of $T_{1},T_{2}\sim 100~{}\mu$ s achieved recently [211], we expect the deviation of $f_{exp}(p)$ from $f_{th}(p)$ to be an order of magnitude lower. In future, a more accurate quantum Bernoulli factory can be realized and the classical simulation will eventually become intractable.

It is noteworthy that entanglement can be exploited to save resource in the quantum Bernoulli factory, which provides an extra advantage for randomness processing [188]. Extending our implementation to multi-qubit systems can verify this extra quantum advantage. When considering practical imperfections, multiple qubit operation generally has a lower fidelity of measurement. Balancing between the saving of resource and decoherence due to multiple-qubit interactions, it is interesting to see whether multipartite correlation can have extra advantage in practice. As we are focusing on proving the advantage only with coherence, we leave such extension and discussion to future works.

Part III Quantumness and selftesting

Chapter 6 Measurement-device-independent entanglement witness

A conventional way to detect entanglement is via entanglement witness (EW). Practical imperfections can affect the correctness of the witness conclusion. This chapter introduces the measurement device independent entanglement witness (MDIEW) method [29]. We show a time-shift attack to conventional EW and how the MDIEW scheme be immune to such attacks. We also show an experimental realization of the MDIEW scheme [30].

6.1 Time-shift attack

In this section, we show the time-shift attack to conventional entanglement witness. By controlling the arriving time of the photon, we show that the measurement efficiencies mismatches can be exploited to attack conventional EW.

6.1.1 EW and device imperfections

Mathematically, for a given entangled quantum state $\rho$ , an Hermitian operator $W$ is called a witness, if $\mathrm{Tr}[W\rho]<0$ (output of ‘Yes’) and $\mathrm{Tr}[W\sigma]\geq 0$ (output of ‘No’) for any separable state $\sigma$ . Focusing on the bipartite scenario, a general illustration of the conventional EW is shown in Fig. 6.1(a), where two parties, Alice and Bob, each receives one component of a bipartite state $\rho_{AB}$ from an untrusted third party Eve. They want to verify whether $\rho_{AB}$ is entangled or not, by performing local operations and measurements on $\rho_{A}=\mathrm{Tr}_{B}[\rho_{AB}]$ and $\rho_{B}=\mathrm{Tr}_{A}[\rho_{AB}]$ . The correctness of such witness relies on implementation details of $W$ . An unfaithful implementation of $W$ , say, due to device imperfections, would render the witness results unreliable. For example, the measurement devices used by Alice and Bob might possibly be manufactured by another untrusted party, who could collaborate with Eve and deliberately fabricate devices to make the real implementation $W^{\prime}=W+\delta W$ be deviated from $W$ , such that $W^{\prime}$ is not a witness any more,

[TABLE]

That is, with the deviated witness $W^{\prime}$ , a separable state $\sigma$ could be identified as an entangled one, which is more likely to happen when $\mathrm{Tr}[W\sigma]$ is near zero.

There is a strong similarity between EW and quantum key distribution (QKD) where an entanglement-breaking channel would cause insecurity [212]. Roughly speaking, it is crucial for Alice and Bob to prove that entanglement can be preserved in a secure QKD channel. From this point of view, there exists correlation between the security of QKD and the success of EW. For the varieties of attacks in QKD, such as time-shift attack [213] and fake-state attack [214], one may also find similar detection loopholes in the conventional EW process. Originated from this analogy, we construct a time-shift attack that manipulates the efficiency mismatch between detectors used in an EW process. Under this attack, any state could be witnessed to be entangled, even if the input state is separable. By this example, we demonstrate that there do exist loopholes in the conventional EW procedure.

6.1.2 Time-shift attack

Originated from quantum cryptography [213], takes advantage of efficiency mismatch of the measurement devices. As shown in Fig. 6.2(a), typically two detectors are used on each side of Alice and Bob. By controlling the single-photon-counting modules (SPCMs) and coincidence gate, Eve is able to enlarge the efficiency mismatch and hence manipulate the EW result.

To implement this attack, we choose a conventional witness,

[TABLE]

for bipartite states in the form of

[TABLE]

where H (V) denotes the horizontal (vertical) polarization of the single photons and $|\Psi^{-}\rangle=(|HV\rangle-|VH\rangle)/\sqrt{2}$ is a Bell state. By decomposing $W$ into a linear combination of product Pauli matrices,

[TABLE]

the EW can be realized by local measurements, we can decompose $W$ to

[TABLE]

That is, to identify the entanglement, Alice and Bob just have to each analyze the qubit state in three bases separately. When the bipartite state is projected to the positive (negative) eigenstates of $\sigma_{x}\sigma_{x}$ , $\sigma_{y}\sigma_{y}$ , and $\sigma_{z}\sigma_{z}$ , it will contribute positively (negatively) to the witness result $Tr[W\rho_{AB}]$ . For example, when measuring ${\sigma_{x}}{\sigma_{x}}$ , Alice and Bob will both project the input state to the eigenstates of $\sigma_{x}$ , $\sigma_{x}^{+}$ or $\sigma_{x}^{-}$ , with corresponded eigenvalues of $+1$ or $-1$ , respectively, and obtain probabilities $\left\langle{{\sigma_{x}^{\pm}}{\sigma_{x}^{\pm}}}\right\rangle$ . Then the value of $\left\langle{{\sigma_{x}}{\sigma_{x}}}\right\rangle$ is defined as $\left\langle{{\sigma_{x}^{+}}{\sigma_{x}^{+}}}\right\rangle+\left\langle{{\sigma_{x}^{-}}{\sigma_{x}^{-}}}\right\rangle-\left\langle{{\sigma_{x}^{+}}{\sigma_{x}^{-}}}\right\rangle-\left\langle{{\sigma_{x}^{-}}{\sigma_{x}^{+}}}\right\rangle$ . From Eve’s point of view, she wants to convince Alice and Bob that the bipartite state is entangled, that is, $Tr[W\rho_{AB}]<0$ . Thus, her objective is to suppress the positive contributions of $Tr[W\rho_{AB}]$ , such as $\left\langle{{\sigma_{x}^{+}}{\sigma_{x}^{+}}}\right\rangle$ and $\left\langle{{\sigma_{x}^{-}}{\sigma_{x}^{-}}}\right\rangle$ for ${\sigma_{x}}{\sigma_{x}}$ measurement, by manipulating the coincidence rate between SPCMs, equivalently enlarging the detector efficiency mismatch. In this case, from Alice and Bob’s point of view, the real implemented witness $W^{\prime}$ is deviated from the desired one $W$ , and satisfies Eq. (6.1).

To realize the attack, we exploit the time mismatch of the two single-photon-counting modules (SPCMs) such that one detector is more efficient than the other. In this case, the real implementation ( $W^{\prime}$ ) is deviated from the original design witness $W$ . In the attack Eve can suppress the positive contributes of the witness result $Tr[W\rho_{AB}]$ to let the witness result $Tr[W^{\prime}\rho_{AB}]$ be negative by adjusting the time mismatch. For example, when measuring ${\sigma_{x}}{\sigma_{x}}$ , Alice and Bob will project the input state to the eigenstates of $\sigma_{x}$ , that is $\sigma_{x}^{+}$ and $\sigma_{x}^{-}$ , corresponding to positive and negative eigenvalue respectively, and obtain probabilities $\left\langle{{\sigma_{x}^{\pm}}{\sigma_{x}^{\pm}}}\right\rangle$ . Then the value of $\left\langle{{\sigma_{x}}{\sigma_{x}}}\right\rangle$ is defined as

[TABLE]

The probabilities $\left\langle{{\sigma_{x}^{\pm}}{\sigma_{x}^{\pm}}}\right\rangle$ is measured from coincidence counts $N_{A}^{\pm}N_{B}^{\pm}$ of detectors, that is

[TABLE]

If the positive coincidence counts are all suppressed, that is $N_{A}^{+}N_{B}^{+}=N_{A}^{-}N_{B}^{-}=0$ , then the outcome of $\left\langle{{\sigma_{x}}{\sigma_{x}}}\right\rangle$ is

[TABLE]

Similarly, the all the other local measurements $\left\langle{{\sigma_{y}}{\sigma_{y}}}\right\rangle$ and $\left\langle{{\sigma_{z}}{\sigma_{z}}}\right\rangle$ become $-1$ by suppressing positive coincidence counts, which gives a witness result of

[TABLE]

for any state $\rho_{AB}$ .

In our experiment demonstration, we only suppress the positive coincidence counts to $10.9(1)\%$ instead of neglecting all of them to make a wrong witness result of a separable state to be entangled.

In our experiment, as shown in Fig. 6.2(a), by encoding qubits in the polarization of photons, the bipartite state $(|HH\rangle_{ab}+|VV\rangle_{ab})/\sqrt{2}$ is generated via spontaneous parametric down conversion (SPDC). Two adjustable POLs are used to disentangle the initial state and project it to $|HH\rangle_{ab}$ and $|VV\rangle_{ab}$ with equal probabilities, corresponding to the separable state with $v=1$ in Eq. (6.2). After a $45^{\circ}$ HWP, the to-be-witnessed two-qubit system is prepared in the state of $\rho_{AB}=\left({{{\left|{HV}\right\rangle}}\left\langle{HV}\right|+{{\left|{VH}\right\rangle}}\left\langle{VH}\right|}\right)/2$ . Then Alice and Bob each performs polarization analysis on a qubit from the bipartite state using waveplates, PBSs and SPCMs, and guides the electronic signals from the SPCMs into a coincidence gate.

As shown in Fig. 6.2b, in the time-shift attack, Eve controls the delay lines in the detection systems and the time window of the coincidence gate, and hence manipulates the time-dependent coincidence counting rates between detectors $d_{a0}$ and $d_{b0}$ , $d_{a1}$ and $d_{b1}$ . Hence, she can suppress the positive contributions of measurements $\left\langle{{\sigma_{x}}{\sigma_{x}}}\right\rangle,\left\langle{{\sigma_{y}}{\sigma_{y}}}\right\rangle$ and $\left\langle{{\sigma_{z}}{\sigma_{z}}}\right\rangle$ . In our demonstration, by setting proper parameters, we let the positive contributions drop to 10.9(1) $\%$ of their original values. Since this attack would not affect the negative contributions of $Tr[W\rho_{AB}]$ , the experimental outcomes for $\left\langle{{\sigma_{x}}{\sigma_{x}}}\right\rangle,\left\langle{{\sigma_{y}}{\sigma_{y}}}\right\rangle$ and $\left\langle{{\sigma_{z}}{\sigma_{z}}}\right\rangle$ become negative as expected. Finally, Alice and Bob obtain a witness of $\rho_{AB}$ be $\mathrm{Tr}\left[{W^{\prime}\rho_{AB}}\right]=-0.379\left(4\right)$ , although the input state $\rho_{AB}$ is, in fact, separable. By changing the $\Delta t$ to a larger value, one can even obtain a fake result as that from a maximal entangled state. Thus, a separable bipartite state could be wrongly witnessed to be entangled when Eve is able to manipulate the detection system. It is not hard to see that for any state $\rho$ , Eve can perform a similar attack and trick Alice and Bob that it is entangled.

6.2 The MDIEW scheme

Recently, Lo et al. [185] proposed an measurement-device-independent (MDI) QKD method, which is immune to all hacking strategies on detection. Due to the similarity between QKD and EW, one would also expect that there exist EW schemes without detection loopholes. Meanwhile, a nonlocal game is proposed to distinguish any entangled state from all separable states [3]. Inspired by this game, Branciard et al. [29] proposed an MDIEW method, where they proved that there always exists an MDIEW for any entangled state with untrusted measurement apparatuses.

As shown in Fig. 6.1(b), Alice and Bob want to identify whether a given bipartite state, prepared by an untrusted party Eve, is entangled or not without trusting measurement devices. To do so, Alice (Bob) prepares an ancillary state $\tau_{s}$ ( $\omega_{t}$ ) and sends it along with the to-be-witnessed bipartite state to a willing participant, who can be assumed to be Eve again in the worst case scenario. Eve performs two Bell-state measurements (BSMs) on the two ancillary states and the bipartite state. Then, she announces to Alice and Bob the BSMs results, based on which they will witness the entanglement of the bipartite state. In MDIEW, it is guaranteed that a separable state will never be wrongly identified as an entangled one, even if Eve maliciously makes wrong measurements and/or announces unfaithful information.

Measurement-device-independent entanglement witness (MDIEW) provides means to witness entanglement of a quantum state without trusting measurement devices [29]. The idea of MDIEW is inspired from the MDI quantum key distribution (MDIQKD) [185]. As proved in Ref. [29], there always exists an MDIEW for any quantum state $\rho$ , as one can always construct MDIEW based on the conventional witness $W$ which exists for any quantum state (we refer to [44] for details of conventional entanglement witness). In the following, we will design a MDIEW scheme and apply it to a type of bipartite quantum states in the form of

[TABLE]

with $v\in[0,1]$ and $|\Psi^{-}\rangle=(|01\rangle-|10\rangle)/\sqrt{2}$ . The state is entangled if $v<1/2$ , which can be witnessed by a conventional EW,

[TABLE]

and its result, $\mathrm{Tr}[W\rho^{v}_{AB}]=(2v-1)/2$ .

Practically, the conventional EW can be realized with only local measurements by decomposing $W$ into a linear combination of product Hermitian observables. In the bipartite scenario of Alice and Bob, they only need to perform local measurements to decide the entanglement of quantum states. In contrast, MDIEW requires Alice (Bob) to prepare another ancillary state ${\tau_{s}}$ ( ${\omega_{t}}$ ) and perform Bell-state measurements (BSMs) on the to be witnessed state and the ancillary state. Based on the choice of the ancillary states, labeled by $s$ and $t$ , and the measurement outcomes, labeled by $a$ and $b$ , MDIEW is defined as

[TABLE]

That is, $\rho_{AB}$ is entangled while $J(\rho_{AB})<0$ and for any separable state $\sigma_{AB}$ , we have $J(\sigma_{AB})\geq 0$ . Here the probabilities $p(a,b|\tau_{s},\omega_{t})$ are obtained from performing two BSMs on the to be witnessed state $\rho_{AB}$ and the ancillary states ${\tau_{s}}$ and ${\omega_{t}}$ . That is,

[TABLE]

where $M_{a}$ and $M_{b}$ represent BSMs performed by Alice and Bob with outcome $a$ and $b$ , respectively. In Eq. (6.10), the coefficient $\beta^{a,b}_{s,t}$ is determined by the choice of ancillary states, measurement outcomes and the conventional witness $W$ . In the experiment, as only two $|\Phi^{+}\rangle=(|00\rangle+|11\rangle)/{\sqrt{2}}$ and $|\Phi^{-}\rangle=(|00\rangle-|11\rangle)/{\sqrt{2}}$ out of four BSM outcomes are recorded, we consider the outcomes of $a$ and $b$ to be $+$ and $-$ , which refer to $|\Phi^{-}\rangle$ and $|\Phi^{-}\rangle$ , respectively. There are four kind of $\beta^{a,b}_{s,t}$ , depending on different values of $a$ and $b$ . In the following, we will design $\beta^{a,b}_{s,t}$ for our MDIEW.

The case of $a=+$ and $b=+$ is considered in Ref. [29]. Decompose a conventional EW as a linear combination of product Hermitian operators, $\{\tau_{s}\otimes\omega_{t}$ },

[TABLE]

where the superscript $T$ means matrix transpose. In the corresponding MDIEW, Alice and Bob prepare their ancillary states into $\{\tau_{s}\}$ and $\{\omega_{t}\}$ , respectively. According to Eq. (6.11), $p(+,+|\tau_{s},\omega_{t})$ is obtained by projecting the joint states $\mathrm{Tr}_{B}[\rho_{AB}]\otimes\tau_{s}$ and $\mathrm{Tr}_{A}[\rho_{AB}]\otimes\omega_{t}$ to the maximally entangled states $|\Phi^{+}_{AA}\rangle=(|00\rangle+|11\rangle)/{\sqrt{2}}$ and $|\Phi^{+}_{BB}\rangle=(|00\rangle+|11\rangle)/\sqrt{2}$ , respectively. Then it is easy to show that the relation between MDIEW and the conventional EW is

[TABLE]

which equals $({2v-1})/{8}$ using Eq. (6.8) and (6.9).

MDIEW with two measurement outcomes

In our work, we also consider other BSM outcomes. For example, if Alice and Bob get outcomes $a=-$ and $b=-$ , then $\beta^{--}_{s,t}$ is calculated similarly as Eq. (6.12) by decomposing $W$ ,

[TABLE]

where $\langle j|\tilde{\tau}|i\rangle=(-)^{i+j}\langle j|{\tau}|i\rangle$ and $\langle j|\tilde{\omega}|i\rangle=(-)^{i+j}\langle j|{\omega}|i\rangle$ . By redefining the basis that $W$ is decomposed, $\{\tilde{\tau}\otimes\tilde{\omega}\}$ , the ancillary states prepared by Alice and Bob are still $\{\tau_{s}\}$ and $\{\omega_{t}\}$ . In this case, $p(-,-|\tau_{s},\omega_{t})$ is obtained by projecting the joint states $\mathrm{Tr}_{B}[\rho_{AB}]\otimes\tau_{s}$ and $\mathrm{Tr}_{A}[\rho_{AB}]\otimes\omega_{t}$ to the maximally entangled states $|\Phi^{-}_{AA}\rangle=(|00\rangle-|11\rangle)/{\sqrt{2}}$ and $|\Phi^{-}_{BB}\rangle=(|00\rangle-|11\rangle)/\sqrt{2}$ , respectively.

With a similar manner, one can also decompose $W$ for the cases of $a=+$ and $b=-$ , $a=-$ and $b=+$ . All the four cases of $a$ and $b$ are summarized in Table 6.1.

Next, we need to calculate the coefficients $\beta_{s,t}^{\pm\pm}$ and the corresponding probabilities $p(\pm,\pm|\tau_{s},\omega_{t})$ for given ancillary quantum states $\{\tau_{s}\}$ and $\{\omega_{t}\}$ . Define $\sigma_{0}=I$ and $\sigma_{1},\sigma_{2},\sigma_{3}$ to be the Pauli matrices. Then let $\tau_{s}$ and $\omega_{s}$ both be the eigenstates of $\sigma_{s}$ with eigenvalues of 1. That is, $\tau_{0}=\omega_{0}=I/2$ , $\tau_{s}=\omega_{s}=(I+\sigma_{s})/{2}$ for $s=1,2,3$ . By decomposing $W$ into $\{\tau_{s}^{T}\otimes\omega_{t}^{T}\}$ and $\{\widetilde{\tau}_{s}^{T}\otimes\widetilde{\omega}_{t}^{T}\}$ , we find that the coefficients $\beta^{ab}_{st}$ and the probabilities $p(a,b|\tau_{s},\omega_{t})$ of the two cases $++$ and $--$ are the same, and those of $+-$ and $+-$ are the same.

In the cases of $++$ and $--$ , the coefficients are given by

[TABLE]

with corresponding probabilities of

[TABLE]

There are ten nonzero terms in the coefficient matrix, so ten different ancillary inputs ( $\tau_{s},\omega_{t}$ ) are required. In practice, it is possible to reduce the number of inputs by introducing two other states $\tau_{4}=\frac{I+(\sigma_{x}+\sigma_{y}+\sigma_{z})/\sqrt{3}}{2}$ and $\omega_{4}=\frac{I+(\sigma_{x}+\sigma_{y}+\sigma_{z})/\sqrt{3}}{2}$ . In this case, we have another decomposition of $W$ with coefficients of

[TABLE]

In this setting, only six ancillary sets are required (comparing to ten in the original construction). As a result, we derive the coefficients and probabilities in Eq. (6.10) for outcomes $++$ and $--$ , as shown in Table 6.2.

Similarly, for the other two cases of outcomes $+-$ and $-+$ , the coefficients are

[TABLE]

with corresponding probabilities of

[TABLE]

when using the ancillary states $\tau_{0}=\omega_{0}=I/2$ , $\tau_{s}=\omega_{s}=(I+\sigma_{s})/{2}$ for $s=1,2,3$ . Similarly, we can define $\tau^{\prime}_{4}=\frac{I+(-\sigma_{x}-\sigma_{y}+\sigma_{z})/\sqrt{3}}{2},\,\omega^{\prime}_{4}=\frac{I+(-\sigma_{x}-\sigma_{y}+\sigma_{z})/\sqrt{3}}{2}$ so that another decomposition of $W$ is derived,

[TABLE]

Again, in this setting, only six measurements are required. The coefficients and probabilities of outcomes $+-$ and $-+$ are shown in Table 6.3.

Although each of the four cases above defines an MDIEW, we can combine four of them as one to enhance the successful probability of MDIEW,

[TABLE]

By doing this, we improve the efficiency of experiments by four times comparing to the original proposal [29].

To witness entanglement for the bipartite states defined in Eq. (6.8) with MDIEW defined in Eq. (6.21), in total eight different ancillary state pairs should be prepared, and the results are summarized in Table 6.4.

6.3 Experimental realization

6.3.1 Experiment setup

Our experimental setup for MDIEW is shown in Fig. 6.3, where a six-photon interferometry is utilized. The to-be-witnessed bipartite state $\rho^{v}_{34}$ , defined in Eq. (6.2), is encoded in the photon pair 3 and 4. Photon pairs 1, 2 and 5, 6 are used to prepare the ancillary input states $|\tau_{s}\rangle_{2}$ and $|\omega_{t}\rangle_{5}$ , respectively. In our work, various bipartite states $\{\rho^{v}_{34}\}$ , from maximally entangled to separable, are prepared and tested with the MDIEW. The bipartite state $\rho^{v}_{34}$ is firstly prepared in the Bell state $\left|{{\Phi^{-}}}\right\rangle_{34}=\left({{\left|{HH}\right\rangle}-\left|{VV}\right\rangle}\right)/\sqrt{2}$ via a Bell-state synthesizer [215]. As the coherence length of photons is limited by the interference filtering, two 2-mm BBO crystals in each arm result in a relative phase delay between horizontal and vertical polarization components and cause polarization decoherence. Different $v$ can be selected by the “state selector” [216]. They satisfy the relation of

[TABLE]

where $\theta$ is the angle of the fast axis of the selector HWP.

In the experiment, eight ancillary state pairs $\{\tau_{s},\omega_{t}\}$ are prepared. The states are encoded by tunable waveplates (one HWP sandwiched by two QWPs), which can realize arbitrary single-qubit unitary transformation. Different from directly polarization measurement in the conventional EW, the analysis of MDIEW is completed by BSMs on $\rho_{3}^{v}\otimes|\tau_{s}\rangle\langle\tau_{s}|_{2}$ and $\rho_{4}^{v}\otimes|\omega_{t}\rangle\langle\omega_{t}|_{5}$ , with two, $|\Phi^{\pm}\rangle=(|HH\rangle\pm|VV\rangle)/{\sqrt{2}}$ , out of four outcomes been collected.

As defined in Eq. (6.21), we obtain the experimental results $J_{exp}^{v}$ as shown in Fig. 6.4. In comparison, we also plot $J_{th}(\rho_{AB}^{v})$ for all values of $v$ . Recall that in the aforementioned time-shift attack demonstration, the conclusion from the conventional witness is entangled for $v=1$ , whereas here we show that our MDIEW result is 0.107 $\pm$ 0.019 and does not conclude an entangled state. One can see that our MDIEW is immune to this attack.

Furthermore, we perform tomography on the to-be-witnessed bipartite states $\{\rho_{34}^{v}\}$ . The results of the density matrices are shown in Fig. 6.5. The corresponding $v$ are set by the angle $\theta$ of the selector HWP given in Eq. (6.22), which is consistent with our fitting values as shown in Supplemental Materials. We evaluate the MDIEW results, Eq. (6.21), from the results of the state tomography $J_{tom}$ as shown in Fig. 6.4. Meanwhile, to quantify the entanglement of the bipartite states $\{\rho_{34}^{v}\}$ , we adopt the measure of tangle [170], which can be directly calculated from tomography results. When the tangle goes to zero, the bipartite state becomes a separable state. As shown in the insert of Fig. 6.4, no entanglement exists when $v$ grows beyond $1/2$ . Such phenomenon is related to the “sudden death of entanglement” [217].

6.3.2 Experiment result

Tomography

In the experiment, we prepare the to-be-witnessed bipartite states $\rho_{AB}^{v}$ in the form of Eq. (6.8) with different values $v$ . To verify whether the prepared states $\rho^{v}_{34}$ is close to the desired ones $\rho_{AB}^{v}$ , their density matrices are reconstructed via quantum tomography with $v$ controlled by the angle $\theta$ of the selector HWP, as shown in Eq. (4) in Main Text. Then we fit the value $v$ by the measured density matrixes $\rho^{v}_{34}$ to the desired states $\rho_{AB}^{v}$ . As shown in Eq. (6.8), $\rho_{AB}^{v}$ contains only real numbers, we can infer $v$ from the real part of $\rho^{v}_{34}$ , and the imaginary parts are supposed to be near zero.

The parameter $v$ can be derived from the real-part of matrix $\rho^{v}_{34}$ . For each matrix elements of $\rho^{v}_{34}$ , $\rho_{11},\rho_{22},\rho_{33},\rho_{44}$ , and $\rho_{23}$ ( $\rho_{32}$ is identical to $\rho_{23}$ ), one can estimate $v$ , as shown in Table 6.5. Accordingly, the average value of $v$ and its error bar are evaluated. As one can see that the experimental results agree the theoretical results well.

Tangle

To quantify the entanglement of quantum states, we adopt the measure of tangle [170]. For a 2-qubit state, $\rho_{AB}$ , one can evaluate its tangle by the following steps.

Define a non-Hermitian matrix

[TABLE]

where $\rho^{T}_{AB}$ is the transpose of $\rho_{AB}$ , and the “spin flip matrix $\Sigma$ ” is defined as

[TABLE] 2. 2.

Calculate the eigenvalues of $R$ , and arrange them in decreasing order, $\lambda_{1}\geq\lambda_{2}\geq\lambda_{3}\geq\lambda_{4}$ ; 3. 3.

The concurrence of $\rho_{AB}$ is defined as

[TABLE] 4. 4.

The tangle is defined as

[TABLE]

The tangle of a bipartite state is a measure of entanglement. If the tangle is zero, then the bipartite state $\rho_{AB}$ must be a separable state. For states defined in Eq. (6.8), we can calculate the corresponding tangle. By following the aforementioned steps, we first calculate the four eigenvalues, $0,(1-v)^{2},v^{2}/4,v^{2}/4$ . For $v>2/3$ , we have $v^{2}/4>(1-v)^{2}$ and hence $tangle=C^{2}=0$ . For $2/3\geq v$ , we have $v^{2}/4\leq(1-v)^{2}$ and hence $\sqrt{(1-v)^{2}}-2\sqrt{v^{2}/4}=1-2v$ . Therefore, $C=0$ for $v\geq 1/2$ and $C=1-2v$ for $v<1/2$ ,

[TABLE]

The fitting value of $v$ from state tomography and the tangles are shown in Table 6.6.

Chapter 7 Reliable and robust entanglement witness

This chapter introduces the reliable and robust problem in entanglement witness. The reliable problem can be overcome by the MDIEW scheme. While we show in this chapter how the robust problem can also be resolved [31].

7.1 Reliable and robust problem in EW

In reality, EW implementation may suffer from two problems. The first one is reliability. That is, one might conclude unreliable results due to imperfect experimental devices. In this case, the validity of the EW result depends on how faithful one can implement the measurements according to the witness $W$ . If the realization devices are not well calibrated, the practically implemented observable $W^{\prime}$ may deviate from the original theoretical design $W$ , see Fig. 7.1 as an example, which can even be not a witness. That is, there may exist some separable states $\sigma$ , such that $\mathrm{Tr}[\sigma W^{\prime}]<0\leq\mathrm{Tr}[\sigma W]$ . Practically, by exploiting device imperfections, an attack has been experimentally implemented for an entanglement witness procedure [30]. In cryptographic applications, such problem is regarded as a loophole, where one mistakes separable states to be entangled ones. For instance, in QKD, this would indicate that an adversary successfully convinces the users Alice and Bob to share keys which they think are secure but are eavesdropped. Such problem is solved by the measurement-device-independent QKD scheme [185], inspired by the time-reversed entanglement-based scheme [218, 219, 186]. Branciard et al. applied a similar idea to EW and proposed the measurement-device-independent entanglement witness (MDIEW) scheme [29], in which entanglement can be witnessed without assuming the realization devices. The MDIEW scheme is based on an important discovery that any entangled state can be witnessed in a nonlocal game with quantum inputs [3]. In the MDIEW scheme, it is shown that an arbitrary conventional EW can be converted to be an MDIEW, which has been experimentally tested [30].

The second problem lies on the robustness of EW implementation. Since each (linear) EW can only identify certain regime of entangled states, a given EW is likely to be ineffective to detect entanglement existing in an unknown quantum state. While a failure of detecting entanglement is theoretically acceptable, in practice, such failure may cause experiment to be highly inefficient. In fact, a conventional EW can only be designed optimal when the quantum state has been well calibrated, which, on the other hand, generally requires to run quantum state tomography. Practically, when the prepared state can be well modeled, one can indeed choose the optimal EW to detect its entanglement. Since a full tomography requires exponential resources regarding to the number of parties, EW plays as an important role for detecting well modeled entanglement, which would generally fail for an arbitrary unknown state. In a way, this problem becomes more serious in the MDIEW scenario, where the measurement devices are assumed to be uncharacterized and even untrusted. In this case, the implemented witness, which may although be designed optimal at the first place, can become a bad one which merely detects no entanglement. However, the observed experimental data may still have enough information for detecting entanglement. Therefore, the key problem we are facing here is that given a set of observed experimental data, what is the best entanglement detection capability one can achieve.

In detecting quantum nonlocality, a similar problem is to find the optimal Bell inequality for the observed correlation, which can be solved efficiently with linear programming [220]. Regarding to our problem, we essentially need to optimize over all entanglement witness to draw the best conclusion of entanglement with the same experiment data, as shown in Fig. 7.2(a). As the set of separable states is not a polytope, this problem cannot be solved by linear programming. Generally speaking, it is proved that the problem of accurately finding such an optimal witness is NP-hard [221]. However, if certain failure probability is tolerable, we show in this work that this problem can be efficiently solved. That is, if we admit a probability less than $\epsilon$ to detect a separable state to be entangled, we show that the optimal entanglement witness can be efficiently found. As the optimization step can be effectively conducted as post-processing, our scheme does not pose extra burdens to experiments compared to the original MDIEW scheme. In this case, our result can be directly applied in practice.

7.2 Reliable entanglement witness

The reliability problem can be overcome by the MDIEW scheme. For self-consistency, we will breifly review the MDIEW scheme.

7.2.1 Nonlocal game

Before, we first discuss about nonlocal games with classical and quantum inputs as shown in Fig. 7.3. In a classical nonlocal game, classical random inputs $x$ and $y$ are given to two spacelikely separated users Alice and Bob, who perform measurement on pre-shared entangled state $\rho_{AB}$ and output $a$ and $b$ , respectively. According to the probability distribution $p(a,b|x,y)$ , a Bell inequality can be defined by

[TABLE]

where $I_{C}$ is a bound for all separable state $\sigma_{AB}$ . A violation of the inequality can be considered as a witness for entanglement. As the Bell test does not assume measurement detail, witnessing entanglement by Bell test is device independent. However, as the conclusion is so strong such that the implementation is self-testing, not all entangled states can be witnessed in such a way [222, 81]. Furthermore, the requirement of a faithful Bell test is very high, which makes such a witnesses impractical. For instance, the minimum efficiency required is $2/3$ for all Bell tests with binary inputs and outputs [72, 73]. On the other hand, if we can trust the measurement, a Bell test essentially becomes an EW. Although such method is able to detect all entangled state and is easy to realize, this scheme is not measurement-device-imperfection-tolerant.

In the seminal work [3], Buscemi introduces the concepts of nonlocal games with quantum inputs. Denote the inputs of Alice and Bob by $\omega_{x}$ and $\tau_{y}$ , then an inequality similar to Bell inequality can be defined by

[TABLE]

where $J_{C}$ is also the bound for all separable state $\rho_{AB}$ . As the quantum inputs can be indistinguishable, it is proved that all entangled states can violate a certain inequality [3]. If we consider the input states are faithfully prepared by Alice and Bob, then such nonlocal game with quantum inputs can be considered as an MDIEW [29]. Moreover, as shown below, there is no detection efficiency limit for such a test.

7.2.2 MDIEW

The nonlocal game presented in Ref. [3] can be considered as a reliable entanglement witness method, which does not witness separable state as entangled with arbitrary implemented measurement. This nonlocal game is thus an MDIEW, i.e., $J\geq 0$ for all separable states and $J$ can be negative if Alice and Bob share entangled state. Furthermore, the statement that $J\geq 0$ for all separable states is independent of the implementation of the measurement. In Ref. [29], the authors put this statement into more concrete and practical framework. They show that, for an arbitrary conventional EW, there is a corresponded MDIEW. Below, we will quickly show how to derive MDIEWs from conventional EWs.

Focus on the bipartite scenario with Hilbert space $\mathcal{H}_{A}\otimes\mathcal{H}_{B}$ , with dimensions $\mathrm{dim}\mathcal{H}_{A}=d_{A}$ and $\mathrm{dim}\mathcal{H}_{B}=d_{B}$ . For a bipartite entangled state $\rho_{AB}$ defined on $\mathcal{H}_{A}\otimes\mathcal{H}_{B}$ , we can always find a conventional entanglement witness $W$ such that $\mathrm{Tr}[W\rho_{AB}]<0$ and $\mathrm{Tr}[W\sigma_{AB}]\geq 0$ for any separable state $\sigma_{AB}$ . Suppose $\{\omega_{x}^{\mathrm{T}}\}$ and $\{\tau_{y}^{\mathrm{T}}\}$ to be two bases for Hermitian operators on $\mathcal{H}_{A}$ and $\mathcal{H}_{B}$ , respectively. Thus, we can decompose $W$ on the basis $\{\omega_{x}^{\mathrm{T}}\otimes\tau_{y}^{\mathrm{T}}\}$ by

[TABLE]

where $\beta^{x,y}$ are real coefficients and the transpose is for later convenience. Notice that, owing to the completeness of the set of density matrices, we further require $\{\omega_{x}\}$ and $\{\tau_{y}\}$ to be density matrices. In addition, the decomposition of Hermitian operators is not unique which varies with different $\{\omega_{x}\}$ and $\{\tau_{y}\}$ .

With a conventional EW decomposed in Eq. (7.3), an MDIEW can be obtained by

[TABLE]

where $\beta^{x,y}_{1,1}=\beta^{x,y}$ and ${p}(1,1|\omega_{x},\tau_{y})$ is the probability of outputting $(a=1,b=1)$ with input states $(\omega_{x},\tau_{y})$ . In the MDIEW design, Alice (Bob) performs Bell state measurement on $\rho_{A}$ ( $\rho_{B}$ ) and $\omega_{x}$ ( $\tau_{y}$ ). The probability distribution ${p}(1,1|\omega_{x},\tau_{y})$ is thus obtained by the probability of projecting onto the maximally entangled state $\ket{\Phi_{AA}^{+}}=1/\sqrt{d_{A}}\sum_{i}\ket{ii}$ and $\ket{\Phi_{BB}^{+}}=1/\sqrt{d_{B}}\sum_{j}\ket{jj}$ .

As shown in Ref. [29], $J$ is linearly proportional to the conventional witness with ideal measurement,

[TABLE]

Thus, $J$ defined in Eq. (7.4) witnesses entanglement. Furthermore, it can be proved that such a witness is independent of the measurement devices. That is, even if the measurement devices are imperfect, $J$ is always non-negative for all separable states and hence no separable state will be mistakenly witnessed to be entangled. We refer to Ref. [29] for a rigorous proof.

Theoretically, the MDIEW scheme prevents identifying separable states to be entangled. Such a reliable MDIEW has been experimentally demonstrated lately [30]. In practice, however, such a scheme can be inefficient, meaning that it witnesses very few entangled states despite that the observed data could actually provide more information. This is because, in the MDIEW procedure, one first chooses a conventional EW and realize in an MDI way. The conventional EW is chosen based on an empirical estimation of the to-be-witnessed state, thus it may not be able to witness the state for an ill estimation. Furthermore, even if the conventional EW is optimal at the first place, the measurement imperfection will make it sub-optimal in practice. Especially, when the input states $\{\omega_{x}\otimes\tau_{y}\}$ is complete, a specific witness may not be able to detect entanglement. With complete information, a natural question is whether we can obtain maximal information about entanglement, i.e., get the optimal estimation of MDIEW.

7.3 Robust MDIEW

Now, we present a method to optimize the MDIEW given a fixed observed experiment data $p(1,1|\omega_{x},\tau_{y})$ . Before digging into the details, we compare the problem to a similar one in nonlocality. In the nonlocality scenario, a Bell inequality is used as a witness for quantumness, see Eq. (9.1). In practice, the Bell inequality may not be optimal for the observed probability distribution $p(a,b|x,y)$ . As the probability distribution of classical correlation forms a polytope, one can run a linear programming to get an optimal Bell inequality for $p(a,b|x,y)$ . While, in our case, the probability distribution $p(1,1|\omega_{x},\tau_{y})$ with separable states is only a convex set but no-longer a polytope. Thus, our problem cannot be solved directly with linear programming.

7.3.1 Problem formulation

Let us start with formulating the optimization problem. Informally, our problem can be described as follows,

Problem (informal): find an optimal witness for the observed probability distribution $p(1,1|\omega_{x},\tau_{y})$ .

According to Eq.(7.4), the witness value is defined by a linear combination of $p(1,1|\omega_{x},\tau_{y})$ with coefficient $\beta^{x,y}$ . To witness entanglement, the coefficient $\beta^{x,y}$ must lead to a witness as defined in Eq. (7.3). In addition, as we can always assign $2\beta^{x,y}$ to double a violation, we require a trace normalization of the witness $W$ by

[TABLE]

Under this normalization, the optimal entanglement witness $W$ [223] for a given state $\rho$ is defined by the solution to the minimization

[TABLE]

Generally speaking, the minimum value, i.e., maximum violation, of the entanglement witness makes the result more robust to experimental errors and statistical fluctuations. Furthermore, a larger violation of entanglement witness can also help for a larger estimation of entanglement measures [224].

Therefore, the problem can be expressed as

Problem (formal): For a given probability distribution ${p}(1,1|\omega_{x},\tau_{y})$ , minimize

[TABLE]

over all $\beta^{x,y}$ satisfying

[TABLE]

for any separable state $\sigma_{AB}$ and

[TABLE]

Contrary to the optimization of Bell inequality, we can see that this problem is much more complex. When the measurements are implemented faithfully, it is easy to verify that $p(1,1|\omega_{x},\tau_{y})=\mathrm{Tr}[(\omega_{x}\otimes\tau_{y})\rho_{AB}]/\sqrt{d_{A}d_{B}}$ , where $\rho_{AB}$ is the state measured. Therefore, finding the optimal $\beta^{x,y}$ is equivalent to find the optimal entanglement witness $W=\sum_{x,y}\beta^{x,y}\omega_{x}^{\mathrm{T}}\otimes\tau_{y}^{\mathrm{T}}$ for state $\rho_{AB}$ . A possible solution to this problem is to try all entanglement witnesses to find the optimal one, see Fig. 7.2. However, it is proved that the problem of accurately finding such an optimal witness is NP-hard [221]. Thus, our problem is also intractable for the most general case.

7.3.2 $\epsilon$ -level optimal EW

The key for the problem being intractable is that there is no efficient way to characterize an arbitrary entanglement witness. In the bipartite case, an operator is an witness if and only if

[TABLE]

for any separable state $\sigma_{AB}$ . As $\sigma_{AB}$ can always be decomposed as a convex combination of separable states as $\ket{\psi}_{A}\ket{\phi}_{B}$ , the condition can be equivalently expressed as

[TABLE]

for any pure states $\ket{\psi}_{A}$ and $\ket{\phi}_{B}$ . The constraints for a witness $W$ are very difficult to describe in the most general case, which makes our problem hard.

While, this problem can be resolved if we allow certain failure errors. A Hermitian operator $W_{\epsilon}$ is defined as an $\epsilon$ -level entanglement witness, when

[TABLE]

where $S$ is the set of separable states. That is, the operator $W_{\epsilon}$ has a probability less than $\epsilon$ to detect a randomly selected separable quantum state to be entangled. Intuitively, $\epsilon$ can be regarded as a failure error probability. We refer to Ref. [225] for a rigorous definition. It is shown that the $\epsilon$ -level optimal EW can be found efficiently for any given entangled state $\rho$ . In particular, constrained on $\mathrm{Tr}[W_{\epsilon}]=1$ and $W_{\epsilon}$ to be an $\epsilon$ -level EW, one can run a semi-definite programming (SDP) to minimize $\mathrm{Tr}[W_{\epsilon}\rho]$ .

7.3.3 Solution

Following the method proposed in Ref. [225], we can solve the minimization problem given in Eq. (7.8) by allowing a certain failure probability $\epsilon$ . First, we relax the constraint given in Eq. (7.9). Instead of requiring being non-negative for all separable states, we randomly generate $N$ separable states $\{\ket{\psi}_{A}^{i}\ket{\phi}_{B}^{i}\}$ and require that

[TABLE]

where $\langle\omega_{x}^{\mathrm{T}}\otimes\tau_{y}^{\mathrm{T}}\rangle^{i}=\bra{\psi}_{A}^{i}\bra{\phi}_{B}^{i}\omega_{x}^{\mathrm{T}}\otimes\tau_{y}^{\mathrm{T}}\ket{\psi}_{A}^{i}\ket{\phi}_{B}^{i}$ . Then the problem can be expressed as

Problem ( $\epsilon$ -level): given a probability distribution $p(1,1|\omega_{x},\tau_{y})$ , minimize

[TABLE]

over all $\beta^{x,y}$ satisfying

[TABLE]

for $N$ randomly generated separable states $\{\ket{\psi}_{A}^{i}\ket{\phi}_{B}^{i}\}$ and

[TABLE]

This problem can be converted to an SDP solvable problem when we re-express the inequality of numbers in Eq. (7.16) by an inequality of matrices. To do so, we only need to notice that Eq. (7.12) is equivalent to require that

[TABLE]

where $W_{B}\geq 0$ indicates that $W_{B}$ has non-negative eigenvalues. Therefore, we only need to generate $N$ states $\ket{\psi}_{A}^{i}$ , for $i=1,2,\dots,N$ , and the problem is

Problem ( $\epsilon$ -level, SDP): given a probability distribution ${p}(1,1|\omega_{x},\tau_{y})$ , minimize

[TABLE]

over all $\beta^{x,y}$ satisfying

[TABLE]

for $N$ randomly generated states $\{\ket{\psi}_{A}^{i}\}$ and

[TABLE]

In practice, we can run an SDP to solve this problem. According to Ref. [225, 226], to get the $\epsilon$ -level witness with probability at least $1-\beta$ , the number of random states $N$ should be at least $r/(\epsilon\beta)-1$ . Here $r$ is the number of optimization variables, i.e., coefficients $\beta$ , and $\beta$ can be understood as the failure probability of the minimization program. It is worth to remark that the problem can be similarly solved in the multipartite case.

7.3.4 Example

In this section, we show explicit examples about how the witness becomes non-optimal in the MDI scenario and how this problem can be resolved by running the optimizing program.

Suppose the to-be-witnessed state is a two-qubit Werner state [222]:

[TABLE]

where $\ket{\Psi^{-}}=1/\sqrt{2}(\ket{01}-\ket{10})$ and $I$ is the identity matrix. The designed entanglement witness for the Werner states is

[TABLE]

As $\mathrm{Tr}[W\rho^{v}_{AB}]=(1-3v)/4$ , $\rho^{v}_{AB}$ is entangled for $v>1/3$ and separable otherwise.

As shown in Ref. [29], we can choose the input set by

[TABLE]

where $\vec{n}=(1,1,1)/\sqrt{3}$ , $\vec{\sigma}=(\sigma_{1},\sigma_{2},\sigma_{3})$ is the Pauli matrices, and $\sigma_{0}=I$ . According to Eq. (7.3), the witness can be decomposed on the basis of $\{\omega_{x}\otimes\tau_{y}\}$ with coefficient $\beta^{x,y}$ given by

[TABLE]

And the MDIEW value is given by

[TABLE]

In the ideal case, the probability distribution $p(1,1|\omega_{x},\tau_{y})$ is obtained by projecting onto maximally entangled states, that is,

[TABLE]

where $M_{A}=\ket{\Phi_{AA}^{+}}\bra{\Phi_{AA}^{+}}$ and $M_{B}=\ket{\Phi_{BB}^{+}}\bra{\Phi_{BB}^{+}}$ . While, in practice, there may exist imperfection in measurement. For instance, we consider that Alice’s measurement is perfect while Bob’s measurement is instead

[TABLE]

where $\ket{\Phi_{BB}^{-}}=1/\sqrt{2}(\ket{00}-\ket{11})$ . In the case of quantum key distribution, projecting onto $\ket{\Phi_{BB}^{-}}$ can be regarded as a phase error.

As shown in Fig. 7.4, we plot the MDIEW and the optimized MDIEW results. For the original MDIEW result, as Bob’s measurement is incorrect, no Werner state given in Eq. (7.22) can be witnessed to be entangled. Although, by optimizing over all possible entanglement witness, we show that $\rho^{v}_{AB}$ is entangled as long as $v>1/3$ . In this case, the optimized MDIEW can detect all entangled Werner states. In our program, we set $N=1000$ and we can see from Fig. 7.4 that no separable state is falsely identified as entangled.

The optimization program finds the optimal $\epsilon$ -level optimal EW $W_{\epsilon}$ , which as its name indicates, has a probability less than $\epsilon$ to detect an separable state to be entangled. To get a smaller $\epsilon$ , one can use a larger number $N$ of random states. In this case, the $\epsilon$ can be regarded as the statistical fluctuation which is inversely related to the number of trials $N$ . On the one hand, to efficiently get the optimal witness $W_{\epsilon}$ , one has to introduce a nonzero failure error $\epsilon$ ; On the other hand, we can always add an extra term to the EW to eliminate $\epsilon$ , i.e.,

[TABLE]

where $\alpha$ is chosen to be the minimum value such that $W$ is an entanglement witness. To efficiently find $\alpha$ , one can make use of the technique similar to Ref. [227], in which, EW can be systematically constructed.

Part IV Randomness and selftesting

Chapter 8 Randomness Requirement on CHSH Bell Test in the Multiple Run Scenario

This chapter investigates the randomness assumption in Bell test. Specifically, we discuss the randomness requirement such that quantum mechanics can have a violation of the Clauser-Horne-Shimony-Holt (CHSH) inequality [228].

8.1 Randomness Requirement

8.1.1 Randomness loophole

Historically, Bell tests [19] are proposed for distinguishing quantum theory from local hidden variable models (LHVMs) [28]. In a general picture, a Bell test involves multiple parties who randomly choose inputs and generate outputs with pre-shared physical resources. Based on the probability distributions of inputs and outputs, an inequality, called Bell’s inequality, is defined. A Bell test is meaningful when all LHVMs satisfy the Bell’s inequality; while in quantum mechanics, such inequality can be violated via certain quantum settings. Experimentally observing a violation of any Bell’s inequality would show that LHVMs are not sufficient to describe the world, and other theories, such as the quantum mechanics, are demanded.

Here, we focus on the bipartite scenario and investigate one of the most well-known Bell tests, the CHSH inequality [1]. As shown in Fig. 9.1(a), two space-like separated parties, Alice and Bob, randomly choose input bit settings $x$ and $y$ and generate outputs bits $a$ and $b$ based on their inputs and pre-shared quantum ( $\rho$ ) and classical ( $\lambda$ ) resources, respectively. The probability distribution $p(a,b|x,y)$ , obtaining outputs $a$ and $b$ conditioned on inputs $x$ and $y$ , are determined by specific strategies of Alice and Bob. By assuming that the input settings $x$ and $y$ are chosen fully randomly and equally likely, the CHSH inequality is defined by a linear combination of the probability distribution $p(a,b|x,y)$ according to

[TABLE]

where the plus operation $\oplus$ is modulo 2, $\cdot$ is numerical multiplication, and $S_{C}$ is the (classical) bound of Bell value $S$ for all LHVMs. Similarly, there is an achievable bound $S_{Q}=2\sqrt{2}$ for the quantum theory [67]. In this case, a violation of the classical bound $S_{C}$ indicates the need for alternative theories other than LHVMs, such as quantum theory. For general no signalling (NS) theories [68], denote the corresponded upper bound as $S_{NS}=4$ . It is straightforward to see that $S_{NS}\geq S_{Q}\geq S_{C}$ .

The violation of Bell’s inequality not only acts as a test for fundamental laws of physics, but has varieties of applications in modern quantum information tasks. For instance, observing violations of Bell’s inequalities can be applied in device independent tasks, such as quantum key distribution [55, 26, 56, 57], randomness amplification [58, 59, 60] and generation [61, 62, 63, 64], entanglement quantification [65], and dimension witness [66]. Security proofs of these tasks are generally independent of the realization devices or correctness of quantum theory, but relies on violating a Bell’s inequality. For instance, consider the devices of Alice and Bob as black boxes. In this case, assume, in the worse scenario, that an adversary Eve, instead of Alice and Bob, performs measurements as shown in Fig. 9.1(b). Because the two parties are space-like separated, the probability distribution generated in this way is always within the scope of LHVMs, that is, $p(a,b|x,y)=p(a|x,\lambda)p(b|y,\lambda)$ , where $\lambda$ is a hidden variable that is controlled by Eve. Therefore, Eve cannot fake a violation of any Bell tests, which intuitively explains the security of the device independent tasks.

Since the first experiment in the early 1980s [229], lots of lab demonstrations of the CHSH inequality have been presented. These experiment results show explicit violations of the LHVMs bound $S_{C}$ , and meanwhile, suffer from a few technical and inherent loopholes, which might invalidate the conclusions. Two well-known technical obstacles are due to the locality loophole and the detection efficiency loophole, which can be closed with more delicately designed experiments and developed instruments in varieties of experiment systems, including optic systems [75, 76], superconducting systems [230], ionic systems [74], and atomic systems [70]. In contrast to the technical loopholes, there also exists an inherent loophole that cannot be closed completely in any Bell test — the input settings may not be chosen randomly. In the worst case, the inputs can be all predetermined, which makes it possible to violate the Bell inequalities even with LHVMs. In this case, witnessing a violation of a Bell’s inequality does not imply the demand for non-LHVM theories and such Bell test cannot be used for the device independent tasks either. On the other hand, without the quantum theory or violation of Bell’s inequalities, one cannot get provable randomness. Therefore, the assumption of true input randomness are indispensable in Bell tests because one cannot prove or disprove its existence.

Practically, the case of not fully random input settings corresponds to the scenario where the input settings are partially controlled by an adversary Eve, who wants to convince Alice and Bob a violation of Bell’s inequality with classical settings. In this case, Eve is able to simultaneously control the input settings and measurement devices, as shown in Fig. 9.1(c). We model the imperfect randomness by assuming that the input settings $x$ and $y$ are chosen according to some probability distribution $q(x,y|\lambda)$ , conditioned on the same local variable $\lambda$ which is available to the adversary Eve. Now, the probability distribution $p(a,b|x,y)$ of LHVMs are defined by

[TABLE]

where $q(\lambda)$ is the prior probability distribution of $\lambda$ , and $q(x,y)=\sum_{\lambda}q(x,y|\lambda)q(\lambda)$ is the observed average probability of the input settings $x$ and $y$ . Notice that $q(\lambda)$ is normalized by restricting $\sum_{\lambda}q(\lambda)=1$ . Now, the CHSH $S$ value under the classical strategy given in Fig. 9.1(c) can be rewritten according to

[TABLE]

where we additionally require the observed probability of choosing $x$ and $y$ to be uniform, that is, $q(x,y)=1/4,\forall x,y$ .

Notice that, in the extreme (deterministic) case where $q(x,y|\lambda)=0$ or $1$ for all $x$ , $y$ , the local hidden variable $\lambda$ deterministically controls the input settings. Then Eve is able to violate Bell tests to an arbitrary value with LHVMs. On the other hand, if Eve has no control of the input settings where $q(x,y|\lambda)=1/4$ for all $x$ , $y$ , she cannot fake a violation at all. Therefore, a meaningful question to ask is how one can assure that a violation of the CHSH inequality is not caused by Eve’s attack on imperfect input randomness. That is, we want to know what the requirement of the input randomness is to guarantee that an observed violation truly stems from quantum effects.

8.1.2 Randomness requirement in Bell test

Let us start with quantifying the input randomness. Here, we make use of the randomness parameter $P$ adopted in Ref. [77] to fulfill such an attempt, other tools such as the Santha-Vazirani source [142] may work similarly. The parameter $P$ is defined to be the maximum probability of choosing the inputs conditioned on the hidden variable $\lambda$ ,

[TABLE]

With this definition, the larger $P$ is, the less input randomness, the more information about the inputs Eve has, and the easier for her to fake a quantum violation with LHVMs. In the CHSH test, $P$ takes values in the regime of $[1/4,1]$ . When $P=1$ , it represents the case that Eve has whole information of Alice and Bob’s inputs, that is, Eve can always correctly infer the values of $x$ and $y$ by accessing the local hidden variable $\lambda$ . When $P=1/4$ , it corresponds to the case of complete randomness, where the adversary have no additional information on the inputs compared to naive guess. Note that the definition of $P$ essentially follows the min-entropy, which is widely used to quantify randomness of a random variable $X$ in information theory, $H_{min}=-\log\left[\max_{x}prob(X=x)\right]$ .

Intuitively, given complete randomness where $P=1/4$ , the value $S$ with LHVMs are bounded by $S_{C}$ as shown in Eq. (9.1); while given the most dependent (on $\lambda$ ) randomness where $P=1$ , the value $S$ with LHVMs could reach the mathematical maximum, $S_{NS}$ in the CHSH test. Then it is interesting to check the maximal $S$ value for $P\in(1/4,1)$ with LHVMs. In this work, we are interested in when the adversary can fake a quantum violation given certain randomness $P$ . We thus exam the lower bound $P_{Q}$ of $P$ such that the Bell test result can reach the quantum bound $S_{Q}$ with an optimal LHVM. This lower bound $P_{Q}$ puts a minimal randomness requirement in a Bell test experiment. Only if the freedom of choosing inputs satisfies $P<P_{Q}$ , can one claim that the Bell test is free of the randomness loophole.

Recently, lots of efforts have been spent on investigating such requirement of randomness needed to guarantee the correctness of Bell tests [35, 144, 231, 77, 78, 232, 145]. These works analyze under different conditions. One condition is about whether the input settings of the same party are dependent or not in different runs. We call it single run, referring to the case that the input settings of Alice (Bob) are correlated for different runs, and multiple run referring to otherwise. The other condition is about whether the random inputs of Alice and Bob are correlated or not. Conditioned on these different assumptions of the input randomness, the lower bound $P_{Q}$ that allows LHVMs to saturate the quantum bound $S_{Q}$ in the CHSH Bell test is summarized in Table 8.1.

In the single run scenario, the optimal strategies for Eve reach $S=24P-4$ and $S=8P$ in the case that Alice’s and Bob’s input settings are correlated and uncorrelated, respectively [35, 77]. To achieve the maximum quantum violation $S_{Q}=2\sqrt{2}$ , the critical randomness requirement is shown in Table 8.1. It is worth mentioning that if one has randomness $P\geq P_{NS}=1/3$ and $P\geq P_{NS}=1/2$ for the case of correlated and uncorrelated inputs, respectively, Eve is able to recover arbitrary NS correlations.

In a more realistic scenario, the multiple run case, the input settings of Alice (Bob) are dependent in different runs. Now, suppose the inputs may correlate for each $N$ sequent runs, where $N=1$ stands for the single run case, and $N>1$ for the multiple run case. For each unit of $N$ runs, denote $x_{j}$ ( $y_{j}$ ) and $a_{j}$ ( $b_{j}$ ) to be the input and output of Alice (Bob) for the $j$ th run, where $j=1,2,\dots,N$ , respectively. In the multiple run scenario, correlations of the inputs of each $N$ runs can be represented by

[TABLE]

Therefore, similar to the definition of Eq. (9.5), the $S$ value with LHVMs in the multiple run case can be defined by

[TABLE]

where the index $j$ denotes the $j$ th run, and $\mathbf{x}=(x_{1},x_{2},\dots,x_{N})$ , $\mathbf{y}=(y_{1},y_{2},\dots,y_{N})$ . Notice that we only consider the correlations of inputs in the unit of $N$ runs, which is not the total number of runs in experiment. To get an accurate estimation of the $S$ value defined in Eq. (8.6), one also need to perform the $N$ runs multiple times similar to the single run case.

In the multiple run scenario, as an extension of Eq. (9.2), the input randomness parameter is defined according to

[TABLE]

It is quite straightforward that the adversary is easier to fake a violation of a Bell test with LHVMs with increasing number of correlation $N$ of the inputs. This is because the adversary can take advantage of additional dependence of the inputs in different runs. It has been shown that with randomness $P\geq P_{Q}\approx 0.258$ , Eve is able to fake the maximum quantum violation $S_{Q}$ [78] with the number of input correlation $N$ goes to infinity. This result [78] lower bounds $P_{Q}$ for all finite $N$ , and thus puts a very strict requirement on the input randomness to guarantee a faithful CHSH test.

A meaningful remaining question is thus to consider the multiple run but uncorrelated scenario. As all Bell experiments must run many times to sample the probability distribution, it is reasonable and also practical to consider a joint attack by Eve. On the other hand, the uncorrelated assumption is also reasonable when the inputs of Alice and Bob are independent even conditioned on $\lambda$ , that is, $q_{A}(x|\lambda,y)=q_{A}(x|\lambda)$ and $q_{B}(y|\lambda,x)=q_{B}(y|\lambda)$ . Equivalently, the probability of the inputs are required to be factorizable,

[TABLE]

This factorizable (uncorrelated) condition constrains the power of Eve in controlling or inferring the inputs of Alice and Bob. A general distribution $q(x,y|\lambda)$ requires Eve to jointly control the instruments that Alice and Bob use to generate random inputs. In the case when the experiment instruments of Alice and Bob are manufactured independently or the inputs are determined by sources causally disconnected from each other, such as cosmic photons [79], the inputs $x$ and $y$ can be assumed to be independent to each other conditioned on the hidden variable $\lambda$ . That is, Eve can only control each of the input settings independently according to Eq. (9.7).

In the multiple run and uncorrelated scenario, the $S$ value with LHVMs is defined by

[TABLE]

Our purpose is to investigate the optimal attack of CHSH test with restricted randomness input $P$ . Therefore we want to maximize Eq. (8.9) with the constraint of Eq. (8.7). In particular, we are interested to see when this maximal value can reach $S_{Q}=2\sqrt{2}$ .

8.2 Single run case

We first review the optimal strategy in the single run scenario [77] to get an intuition behind the optimal attack of the adversary. Hereafter, we mainly focus on the scenario that Alice and Bob’s inputs are uncorrrelated as defined in Eq. (9.7). Thus, what we want is to maximize the $S$ value,

[TABLE]

where

[TABLE]

with restricted randomness $P$ , given in Eq. (9.2).

Since any probabilistic LHVM, that is, $p_{A}(a|x,\lambda)p_{B}(b|y,\lambda)$ , could be realized by a convex combination of deterministic ones [233], it is therefore sufficient to only consider deterministic LHVMs. Due to the symmetric definition of the CHSH inequality, we only need to consider a specific strategy of $p_{A}(0|x,\lambda)=p_{B}(0|y,\lambda)=1$ , and $p_{A}(1|x,\lambda)=p_{B}(1|y,\lambda)=0$ for some given $\lambda$ , and all the other ones works similarly. By substituting the special strategy into Eq. (8.11), we get

[TABLE]

Suppose $P_{A}=\max_{x,\lambda}\{q_{A}(x|\lambda)\}$ , $P_{B}=\max_{y,\lambda}\{q_{B}(x|\lambda)\}$ , and hence $P=P_{A}P_{B}$ , $S_{\lambda}$ can be maximized to

[TABLE]

Given $P$ , $S_{\lambda}$ is upper bounded by

[TABLE]

where the equality holds when $P_{B}=1/2$ and $P_{A}=2P$ . Thus, the optimal strategy with LHVMs is $S=8P$ . Note that, when the input settings are fully random, $P=1/4$ , the optimal strategy of LHVMs is $S=2$ , which recovers the original LHVMs bound $S_{C}$ . It is easy to see that, to saturate the quantum bound $S_{Q}=2\sqrt{2}$ , the randomness should be at least $P_{Q}=S_{Q}/8=\sqrt{2}/4\approx 0.354$ , as shown in Table 8.1.

In the single run case, we only need to consider one specific deterministic strategy of $p(a,b|x,y)$ due to the symmetric definition of the CHSH inequality. We also take advantage of this property in the derivation of the multiple run case. In addition, we can see that the optimal strategy of LHVMs is to choose $x$ or $y$ fully randomly and the other one as biased as possible. This biased optimal strategy is counter-intuitive since the adversary do not need to control the inputs of both parties, but only those of one party. We show that this counter-intuitive feature does not hold in the optimal strategy in the multiple run case.

8.3 Multiple run case

Now we consider the multiple run scenario with uncorrelated input randomness. That is, optimizing Eve’s LHVM strategy Eq. (8.9) with constraints defined in Eq. (8.7). Similar to the single run case, from the symmetric argument, we can also solely consider one specific deterministic strategy, that is, $p_{A}(0|x,\lambda)=p_{B}(0|y,\lambda)=1$ , and $p_{A}(1|x,\lambda)=p_{B}(1|y,\lambda)=0$ . Given the probabilities of Alice’s and Bob’s inputs, $q_{A}(\mathbf{x}|\lambda)$ , $q_{B}(\mathbf{y}|\lambda)$ , the $S$ value, defined in Eq. (8.9), for this specific strategy labeled with $\lambda$ is given by

[TABLE]

where $\cdot$ is vector inner product. Our attempt is therefore to maximize Eq. (8.15) with constraints

[TABLE]

for all $q_{A}(\mathbf{x}|\lambda)$ and $q_{B}(\mathbf{y}|\lambda)$ .

Since in the single run scenario, the optimal strategy requires only one party with biased conditional probability, we first analyze the case with only Alice’s inputs biased and Bob’s inputs uniformly distributed. Then we investigate the case where the inputs of both parties are biased. We can see that the one party biased strategy is not optimal in the multiple run case, even when $N=2$ .

8.3.1 One party Biased

In the case when Eve only (partially) controls one of the inputs, say Alice’s, the probability of Alice’s input string $q_{A}(\mathbf{x}|\lambda)$ is biased and Bob’s input string is uniformly distributed, that is,

[TABLE]

The randomness is characterized by Eq. (8.7), after substituting Eq. (8.17),

[TABLE]

where $P_{A}$ is defined by $P_{A}=\max_{\lambda,\mathbf{x}}q_{A}(\mathbf{x}|\lambda)^{1/N}$ . Then, the $S$ value, defined in Eq. (8.15), becomes

[TABLE]

Denote the number of bit $1$ in an $N$ string $\mathbf{a}$ as $L_{1}(\mathbf{a})$ . Given the number of bit $1$ in $\mathbf{x}$ , $k_{A}=L_{1}(\mathbf{x})$ , we can sum over $\mathbf{y}$ ,

[TABLE]

and group the summation of $\mathbf{x}$ according to $k_{A}$ ,

[TABLE]

One only need to consider the LHVMs whose probabilities of $q_{A}(\mathbf{x}|\lambda)$ with the same $k_{A}$ are the same. Otherwise, we can always take an average of $q_{A}(\mathbf{x}|\lambda)$ with the same $k_{A}$ without increasing the randomness parameter $P$ . Thus we can rewrite $S_{\lambda}$ as

[TABLE]

with normalization requirement

[TABLE]

and constraints defined in Eq. (8.16).

The optimization of Eq. (8.22) can be solved efficiently via linear programming. Intuitively, to maximize $S_{\lambda}$ with given $P$ defined in Eq. (8.18), we can simply assign $q_{k_{A}}(\mathbf{x}|\lambda)$ that has large $k_{A}$ be 0 and that has small $k_{A}$ be $(2P)^{N}$ . Suppose there exists an integer $l$ such that $P$ can be written as

[TABLE]

then, Eq. (8.22) can be rewritten as

[TABLE]

For a general case where an integer $l$ cannot be found satisfying Eq. (8.24), we can first find an integer $l$ such that,

[TABLE]

Then we can assign $q_{k_{A}}(\mathbf{x}|\lambda)$ to be

[TABLE]

For finite $N$ , one can numerically solve the problem according to Eq. (8.27). As shown in Fig. 8.2, the optimal strategy for $N=1,10,100$ are calculated. With increasing $N$ , the optimal value $S$ increases and hence a valid Bell test requires a smaller $P$ (more randomness).

In the case of $N\rightarrow\infty$ , we can derive an analytic bound for all finite $N$ strategies. By following the technique used in Ref. [78], we first estimate $P$ defined in Eq. (8.26) with the limit of $N\rightarrow\infty$ by,

[TABLE]

where $\bar{l}=l/N$ , and similarly $S$ by,

[TABLE]

Then we can substitute Eq. (8.29) into Eq. (8.28), and get a relation between optimized $S$ value and the corresponding randomness parameter $P$ ,

[TABLE]

By substituting the quantum bound $S_{Q}=2\sqrt{2}$ into Eq. (8.30), we can get the critical randomness requirement to be $P_{Q}\approx 0.273$ . Note that, although Eve only control Alice’s input settings, she can still fake a quantum violation with sufficiently low randomness, which is lower than the single run case even when Alice’s and Bob’s inputs are correlated. Thus we show that the randomness is more demanded for the conditions of multiple/single run compared to the correlation between Alice and Bob.

8.3.2 Both parties biased

Now we consider a general attack, where Eve controls both inputs of Alice and Bob. In this case, we need to optimize Eq. (8.15) with constraints defined in Eq. (8.16). Similarly, we group the summation of $\mathbf{x}$ and $\mathbf{y}$ according to the corresponded number of bit $1$ , $k_{A}=L_{1}(\mathbf{x})$ and $k_{B}=L_{1}(\mathbf{y})$ ,

[TABLE]

Now, if we assume that $q_{A}(\mathbf{x}|\lambda)$ ( $q_{B}(\mathbf{y}|\lambda)$ ) has the same value for equal $k_{A}$ ( $k_{B}$ ), we can sum over $\mathbf{x}$ and $\mathbf{y}$ for given $k_{A}$ and $k_{B}$ ,

[TABLE]

We can then get the $S$ value,

[TABLE]

with the constraints of $q_{A}(\mathbf{x}|\lambda)$ and $q_{B}(\mathbf{y}|\lambda)$ ,

[TABLE]

It is worth mentioning that the assumption that $q_{A}(\mathbf{x}|\lambda)$ ( $q_{B}(\mathbf{y}|\lambda)$ ) takes the same value for equal $k_{A}$ ( $k_{B}$ ) is not obviously equivalent to the original optimization problem defined in Eq. (8.31). We thus take this step as an additional assumption, and conjecture it to be true for certain cases.

The problem defined in Eq. (8.33) with constraints of Eq. (8.3.2) cannot be solved by linear programming directly, as for the nonlinear terms $q_{k_{A}}(\mathbf{x}|\lambda)q_{k_{B}}(\mathbf{y}|\lambda)$ . However, we can still optimize it with similar methods used in the previous section. Define the maximum randomness on each side

[TABLE]

To maximize $S_{\lambda}$ , we can first optimize Alice’s side, $q_{k_{A}}$ , and then Bob’s side $q_{k_{B}}$ . By doing so, it is not hard to see that $S_{\lambda}$ is maximized by assigning $q_{k_{A}}$ that has small number of $k_{A}$ to be $P_{A}$ and that has large number of $k_{A}$ to be 0, and similarly for $q_{k_{B}}$ . Thus we need to first find $l_{A}$ and $l_{B}$ for Alice and Bob, such that,

[TABLE]

Then we can assign $q_{k_{A}}(\mathbf{x}|\lambda)$ and $q_{k_{B}}(\mathbf{y}|\lambda)$ to be

[TABLE]

to optimize $S_{\lambda}$ defined in Eq. (8.33).

For finite $N$ , we can also numerically solve the optimization problem defined in Eq. (8.33). As shown in Fig. 8.3. The value $S$ increases with the number of runs $N$ , thus the strategy with infinite rounds puts a bound on the strategy with finite rounds.

In the case of $N\rightarrow\infty$ , we can also find analytical relation between optimized $S$ and the corresponded $P$ . Similarly, we first estimate $P_{A}$ and $P_{B}$ defined in Eq. (8.3.2) with the limit of $N\rightarrow\infty$ by

[TABLE]

where $\bar{l}_{A}=l_{A}/N$ and $\bar{l}_{B}=l_{B}/N$ , and $S$ according to

[TABLE]

As we still have to optimize over all possible $P_{A}$ and $P_{B}$ that satisfies $P_{A}P_{B}=P$ , we cannot get a direct analytic formula like in Eq. (8.30), while we can still numerically solve and plot it in Fig. 8.3. To reach a maximum quantum violation $S_{Q}=2\sqrt{2}$ with a LHVM, the randomness is required to be $P\geq P_{Q}\approx 0.264$ , which is larger than the case where Eve only control’s Alice’s input.

8.3.3 Discussion

We take an additional assumption in the derivation of the both parties biased case, thus the obtained bound $P_{Q}\approx 0.264$ is still an upper bound of a general optimal attack for the case of $N$ goes to infinity. As we already know, the randomness requirement for the worst case, that is, multiple run with Alice and Bob’s inputs correlated, is strictly bounded by $P_{Q}\approx 0.258$ [78]. Thus, we know that the tight $P_{Q}$ for the case of multiple run but Alice and Bob uncorrelated should lie in the interval of $[0.258,0.264]$ .

To gain intuition why we take the additional assumption, first notice that what we want is to minimize the average contribution of $\mathbf{x}\cdot\mathbf{y}$ in Eq. (8.31). In our case, where $P$ is near 1/4, $q_{A}(\mathbf{x|}\lambda)$ and $q_{B}(\mathbf{y|}\lambda)$ can be regarded as an approximately flat distribution. On average, the $\mathbf{x}$ ( $\mathbf{y}$ ) contains less number of 1s will contribute more to $S$ , which means we should assign the corresponded probability $q_{A}(\mathbf{x|}\lambda)$ ( $q_{A}(\mathbf{y|}\lambda)$ ) bigger in order to maximize $S$ . As $q_{A}(\mathbf{x|}\lambda)$ ( $q_{A}(\mathbf{y|}\lambda)$ ) is upper bounded by $P_{A}$ ( $P_{B}$ ), an intuitive optimal strategy is then to let $q_{A}(\mathbf{x|}\lambda)$ ( $q_{A}(\mathbf{y|}\lambda)$ ) be $P_{A}$ ( $P_{B}$ ) for $\mathbf{x}$ ( $\mathbf{y}$ ) contains less number of 1s, and be 0 for the ones contains more number of 1s. As $q_{A}(\mathbf{x|}\lambda)$ ( $q_{A}(\mathbf{y|}\lambda)$ ) should also satisfy the normalization condition (Eq. (8.3.2)), we can simply follow the strategy defined in Eq. (8.40) to realize the intuition, which on the other hand satisfies the assumption we take. Follow the above intuition, we conjecture the assumption to be true for certain cases of $N$ . That is, for finite $N$ , we conjecture it to be true when equalities are taken in Eq. (8.3.2) for both $P_{A}$ and $P_{B}$ .

On the other hand, we want to emphasize that for a finite $N$ , the assumption will not generally hold in the optimal strategy if the equalities in Eq. (8.3.2) are not fulfilled. For example, if the probability of $l_{A}+1$ and $l_{B}+1$ in Eq. (8.40) is not 0 but very small, we should not take all $q_{A}(\mathbf{x}|\lambda)$ and $q_{B}(\mathbf{y}|\lambda)$ equally as $q_{k_{A}}$ and $q_{k_{B}}$ , especially for the case of $L_{1}(\mathbf{x})=l_{A}+1$ and $L_{1}(\mathbf{y})=l_{B}+1$ , respectively. In fact, there do exists a cleverer assignment of $q_{A}(\mathbf{x}|\lambda)$ and $q_{B}(\mathbf{y}|\lambda)$ such that only $\mathbf{x}$ and $\mathbf{y}$ that gives small $\mathbf{x}\cdot\mathbf{y}$ get probability instead of all of $\mathbf{x}$ and $\mathbf{y}$ that $L_{1}(\mathbf{x})=l_{A}+1$ and $L_{1}(\mathbf{B})=l_{B}+1$ . However, with increasing runs $N$ , this kind of clever attack stops working as for the equalities can be more approximately satisfied with larger $N$ . Therefore, we also conjecture the assumption to be true for all possible $P$ with $N$ goes to infinity.

As we can see, our obtained $P_{Q}\approx 0.264$ is already very close to the worst case value that is $0.258$ , we can therefore conclude that the multiple run correlation is already a strong resource for the adversary, no matter whether the inputs of Alice and Bob are correlated or not. In addition, as we know that the bound $P_{Q}$ for the most loose case, that is, single run and Alice Bob uncorrelated, is given to be $0.354$ [77], we also suggest that the key loophole of the input randomness is the correlation between multiple runs instead of correlation of Alice and Bob.

Considering that Eve controls only Alice’s input but leaves Bob’s input uniformly distributed, we found the least randomness Eve need to control to fake a quantum violation is $P_{Q}\approx 0.273$ . And the least randomness required when controlling both Alice and Bob is $P_{Q}\leq 0.264$ . By comparing the results to the ones listed in Table. 8.1, we conclude that the key randomness loophole is due to the correlation between multiple runs. As the randomness requirement which considers multiple run attack is not easy to fulfill in real experiments, we thus suggest the experiments to rule correlations of the input settings from different runs. To guarantee the securities of the device independent tasks, we also suggest that one should check whether there are correlations between random inputs from different runs.

For further research, we are interested to know whether there exists Bell inequalities that suffers less from the randomness loophole. By assuming different kinds of assumptions, the randomness requirement behaves different. For example, it is interesting to investigate the scenario where the input settings are uncorrelated with the measurement devices by assuming the manufactures are different. That is, there are two uncorrelated hidden variables in Fig. 9.1(c), controlling the input settings and measurement devices independently. Moreover, recently, by considering a nonzero lower bound for the input random probability $p(x,y|\lambda)$ , $\mathrm{P\ddot{u}tz}$ et al. show a Bell inequality which suffers from very little randomness loophole [145]. That is, no adversary can fake a quantum violation as long as the lower bound of $p(x,y|\lambda)$ is nonzero regardless of its upper bound $P$ defined in Eq. (9.2). Therefore, it is interesting to investigate the multiple run randomness requirement of the CHSH inequality with additional assumptions.

Chapter 9 Clauser-Horne Bell test with imperfect random inputs

This chapter investigates general randomness requirement for the Clauser-Horne (CH) inequality [34]. We consider for different conditions. In addition, our method applies for general Bell inequalities.

9.1 General randomness requirement

In the bipartite scenario, a general Bell test involves two remotely separated parties, Alice and Bob, who receive random inputs $x$ and $y$ and produce outputs $a$ and $b$ , respectively. Based on the probability distribution $\tilde{p}_{AB}(a,b|x,y)$ of the outputs conditioned on the inputs, Bell’s inequality can be defined by a linear combination of $\tilde{p}_{AB}(a,b|x,y)$ according to

[TABLE]

where $J_{C}$ is a bound for all local hidden variable models (LHVMs), meaning that, any LHVM cannot violate any Bell’s inequality. Now, we consider the case that the inputs are not fully random. That is, the inputs $x$ and $y$ depend on some local hidden variable, denoted as $\lambda$ , as shown in Fig. 9.1.

The input randomness can be quantified by the dependence of the inputs conditioned on $\lambda$ . Suppose the inputs $x$ and $y$ are chosen according to a priori probability $p(x,y|\lambda)$ , the input randomness can be measured by its upper and lower bounds,

[TABLE]

As an example, for the CH test, where the inputs are binary, the upper and lower bounds are in the range of $[1/4,1]$ and $[0,1/4]$ , respectively. Focusing on the upper bound $P$ , when it equals $1$ , it represent the case that the local hidden variable $\lambda$ deterministically decides at least one input. When $P=1/4$ , this corresponds to the case that the inputs are fully random. Similarly, we can see how the lower bound $Q$ characterizes the input randomness. In many of previous works [35, 231, 77, 78, 232, 228], only the upper bound $P$ is considered. It is recently noted in Ref. [145] that the lower bound $Q$ also plays an important role in analysis. We thus consider both the upper and lower bounds as quantifications of the input randomness.

With binary inputs, we can consider a symmetric case where $P=1/4+\delta$ and $Q=1/4-\delta$ . In other words, we can quantify the input randomness by its deviation from a unform distribution, quantified by $\delta$ ,

[TABLE]

Note that all our following results apply for asymmetric cases (with arbitrary $P$ and $Q$ ) as well.

When the input settings are determined by $p(x,y|\lambda)$ , the observed probability $\tilde{p}_{AB}(a,b|x,y)$ of outputs conditioned on inputs is given by

[TABLE]

where $q(\lambda)$ is the priori probability of $\lambda$ , $p(x,y)=\sum_{\lambda}p(x,y|\lambda)q(\lambda)$ is the averaged probability of choosing $x$ and $y$ , and $\tilde{p}_{AB}(a,b|x,y,\lambda)$ is the strategy of Alice and Bob conditioned on $\lambda$ . Then, the Bell’s inequality defined in Eq. (9.1) should be rephrased by

[TABLE]

In this work, we are interested in how LHVMs can fake a violation of Bell’s inequality with imperfect input randomness. Thus, we can also set the strategy $\tilde{p}_{AB}(a,b|x,y,\lambda)$ of deciding the outputs based on the inputs by $\tilde{p}_{A}(a|x,\lambda)\tilde{p}_{B}(b|y,\lambda)$ , and the Bell value with a LHVM is given by

[TABLE]

Here, for simplicity, we assume that $p(x,y)$ is independent of $x$ and $y$ . What we are interested is to maximize $J^{\mathrm{LHVM}}(P,Q)$ with LHVMs. From another point of view, we want to establish the Bell’s inequality when imperfectly random inputs are considered. Any breach of these bounds (using quantum settings) would rule out LHVMs and in favor of quantum mechanics. Suppose the quantum bound to Eq. (9.5) is denoted by $J_{Q}$ , then we are especially interested to see the condition of $P$ and $Q$ such that $J^{\mathrm{LHVM}}(P,Q)<J_{Q}$ . In experiment, such condition is the necessary condition for a valid Bell test. For a specific observed violation $J_{\mathrm{obs}}$ and input randomness characteristics $P$ and $Q$ , it witnesses non-local feature only if the Bell value satisfies $J^{\mathrm{LHVM}}(P,Q)<J_{\mathrm{obs}}$ .

In varieties of previous works [35, 144, 231, 77, 78, 232, 228], randomness requirements for the CHSH inequality are analyzed. In this work, we focus on another inequality — the CH inequality and consider in general scenarios. For instance, many previous work [35, 231, 77, 78, 228] assumes the underlying probability distribution $\tilde{p}_{AB}(a,b|x,y)$ to satisfy the no-signaling (NS) [68] condition. However, in real experiment, the probability distribution $\tilde{p}_{AB}(a,b|x,y)$ may behave signaling due to statistical fluctuation, devices imperfection, or other possible interventions by the adversary Eve. We thus also consider the general case where $\tilde{p}_{AB}(a,b|x,y)$ can be signaling 111It is worth to remark that even if $\tilde{p}_{AB}(a,b|x,y)$ can be signaling, we still assume that the locality loophole is closed. The possibility of signaling comes from the fact that partial knowledge of the inputs are known. In practice, Alice and Bob cannot transmit signal with such signaling probability distribution.. In addition, we consider the case that the random inputs of Alice and Bob are factorizable. In this case, the input randomness can be written as

[TABLE]

This factorizable assumption is reasonable in some practical scenarios, where the experiment devices that determine the input settings are from independent manufactures or the randomness generation events are also spacelikely separated. For example, if the inputs are determined by cosmic photons that are causally disconnected from each other [79], the input randomness can be reasonably assumed to be factorizable.

9.2 CH inequality

In this section, we will investigate the randomness requirement of the CH inequality under different conditions, including whether $\tilde{p}_{AB}(a,b|x,y)$ is signaling or NS, and whether the factorizable condition is satisfied or not.

9.2.1 CH inequality with LHVMs

The CH inequality is defined in the bipartite scenario, where the input settings $x$ and $y$ and the outputs $a$ and $b$ are all bits. Based on the probability distribution that obtains a specific measurement outcome, for instance $00$ , the CH inequality is defined according to

[TABLE]

where we omit the outputs $a$ and $b$ and define $\tilde{p}_{A}(x)$ ( $\tilde{p}_{B}(y)$ ) to be the probability of detecting [math] with input setting $x$ ( $y$ ) by Alice (Bob), and $\tilde{p}_{AB}(x,y)$ the probability of coincidence detection $00$ for both sides with input settings $x$ and $y$ for Alice and Bob, respectively. To satisfy the general definition of a Bell inequality as shown in Eq. (9.1), the single party probabilities $\tilde{p}_{A}(0)$ and $\tilde{p}_{B}(0)$ need to be properly defined by coincidence detection probabilities. For instance, we can either define $\tilde{p}_{A}(0)$ by the detection probabilities with input $(x=0,y=0)$ , or $(x=0,y=1)$ , or a convex mixture. This arbitrary definition vanishes when the NS condition is satisfied.

In experiment realization, one has to run the CH test multiple times, for instance, $N$ , to determine the probabilities in Eq. (9.8). Denote the coincidence counts by $C_{AB}$ and single counts by $S_{A(B)}$ , we can then write

[TABLE]

Here, $N_{AB}(x,y)$ denotes the total number of trials with input setting $x$ and $y$ , and $N_{A(B)}$ the number of trials with input setting $x$ ( $y$ ) of Alice (Bob).

When the input settings are chosen truly randomly, the CH Bell value $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ with LHVM is always non-positive. While quantum theory could maximally violate it to be $J_{Q}=(\sqrt{2}-1)/2\approx 0.207$ . If the measurement settings $x$ , $y$ are additionally determined by some hidden variable $\lambda$ by probability distribution $p(x,y|\lambda)$ , we show in the following that the CH inequality could be violated even with LHVMs.

With a general LHVM strategy defined in Eq. (9.6), each term in the CH value in Eq. (9.9) can be described by

[TABLE]

Here, we adopt a specific realization of the single counts by taking an average of the observed value. For instance, the single detection probability $p_{A}(0)$ is defined to be a mean of the single detection probabilities with input $(x=0,y=0)$ and $(x=0,y=1)$ .

Besides, in order to convince Alice and Bob that the input settings $x$ and $y$ are chosen freely, Eve has to impose that the averaged probability distributions of the input settings are uniformly random. Then, we can assume $p(x,y)$ to be $1/4$ ,

[TABLE]

In real experiments, the input probability can be arbitrary, where our result can still apply with certain modifications on normalization. With the normalization condition Eq. (9.11), the CH value with LHVMs strategies is given by

[TABLE]

with $J_{\lambda}$ defined by

[TABLE]

With the randomness parameter defined in Eq. (9.2), our target is to maximize $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ defined in Eq. (9.12) for given randomness input $P$ and $Q$ under constraints in Eq. (9.11).

9.2.2 General strategy (attack)

In this part, we consider a general strategy (attack) where no additional assumption is imposed. It is worth mentioning that with the following method, we can essentially convert the optimization problem over all LHVMs into a clearly defined mathematical problem. In the CH example, we show an explicit solution to this mathematical problem. A general solution to this type of mathematical problems will provide a general solution to the problem of imperfect randomness in Bell test.

Note that the optimization of Eq. (9.12) requires to optimize over the strategy of Alice and Bob, $\tilde{p}_{A}(x,\lambda)$ and $\tilde{p}_{B}(y,\lambda)$ , and also the strategy of deciding the inputs, $p(x,y|\lambda)$ , which also satisfies the constraints defined in Eq. (9.11). Here, we first analyze how to optimize the strategy of Alice and Bob.

Because all probabilistic LHVM strategies can be realized with a convex combination of deterministic strategies, it is sufficient to just consider deterministic strategies, i.e., $\tilde{p}_{A}(x),\tilde{p}_{B}(y)\in\{0,1\}$ for the optimization. Conditioned on different values of $\tilde{p}_{A}(x)$ and $\tilde{p}_{B}(y)$ , 16 possible values of $J_{\lambda}$ are listed in Table 9.1, where we omit the $\lambda$ for simple notation hereafter.

Note that, for given $p(x,y|\lambda)$ , we should choose the optimal strategy of $\tilde{p}_{A}(x)$ and $\tilde{p}_{B}(y)$ that maximize $J_{\lambda}$ . Thus we here only consider the possible optimal strategies as listed in Table 9.2. We refer to Appendix C.1 for rigorous proof of why we only consider the possible optimal strategies.

As the strategies of $(\tilde{p}_{A}(0),\tilde{p}_{A}(1),\tilde{p}_{B}(0),\tilde{p}_{B}(1))=(0,1,1,0)$ and $(\tilde{p}_{A}(0),\tilde{p}_{A}(1),\tilde{p}_{B}(0),\tilde{p}_{B}(1))=(1,0,0,1)$ are always better than the strategies of $(\tilde{p}_{A}(0),\tilde{p}_{A}(1),\tilde{p}_{B}(0),\tilde{p}_{B}(1))=(0,1,1,1)$ and $(\tilde{p}_{A}(0),\tilde{p}_{A}(1),\tilde{p}_{B}(0),\tilde{p}_{B}(1))=(1,1,0,1)$ , respectively, we can always replace the later strategies with the former ones without affecting $p(x,y)$ but achieving a larger $J_{\lambda}$ . For simple notation, we denote $p(i,j)$ by $p_{2*i+j}$ hereafter, thus the possible deterministic strategies for $J_{\lambda}$ are in the following set

[TABLE]

Because there are only five possible strategies of Alice and Bob, we can also consider that there are only five different strategies of choosing the input settings. The intuition is that, for the input settings that using the same strategies of Alice and Bob, for instance, $J_{\lambda}=(p_{2}-p_{0})/2$ , we can always take an average of the different strategies of $p(x,y|\lambda)$ without decreasing $J_{\lambda}$ . We refer to Appendix C.1 for a rigorous proof. Therefore, we label $\lambda_{j}$ to be the $j$ th strategy of choosing the input settings and $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ can be rewritten in the following way,

[TABLE]

The constraints of $q(\lambda)$ and $p(\lambda)$ are given by

[TABLE]

Furthermore, we can denote the coefficient of $q(\lambda_{j})p_{i}(\lambda_{j})$ by $\beta_{ij}$ as shown in Table 9.3. Then $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ can be expressed by

[TABLE]

The solution to this optimization problem is shown in Appendix C.2.1. Based on the value of $P$ and $Q$ , we give the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ with LHVMs by

[TABLE]

and plot it in Fig. 9.2. Note that when $P$ is greater than ${3}/{8}$ , the value of $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ is independent of $P$ . Hence, we only plot the situation where $P$ is less than 3/8.

In addition, we can also investigate the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ with input randomness quantified as in Eq. (9.3). It is easy to check that $2P+Q\geq 3/4$ , and the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ is thus

[TABLE]

9.2.3 Result

factorizable condition

Here, we consider the optimal LHVMs strategy in the case where the probability of the input settings are factorizable, as defined in Eq. (9.7).

Following a similar derivation, we show in the Appendix C.2.2 that the optimal CH value $J^{\mathrm{LHVM,Fac}}_{\mathrm{CH}}$ with LHVMs under factorizable condition is

[TABLE]

We show the optimal value of $J^{\mathrm{LHVM,Fac}}_{\mathrm{CH}}$ in Fig. 9.3.

When we quantify $P$ and $Q$ by $P=1/4+\delta$ and $Q=1/4-\delta$ , the fomular can be rewritten by

[TABLE]

It is interesting to note that the factorizable condition does not affect the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}(\delta)$ when the input randomness is quantified as in Eq. (9.3). The quantum bound $J_{Q}$ is given by $(\sqrt{2}-1)/2$ , thus we can see that $\delta$ should at least be less than $0.051$ for all CH experiment realizations.

NS condition

In addition, we consider the scenario where the probability distribution $\tilde{p}_{AB}(a,b|x,y)$ defined in Eq. (9.4) satisfies the NS condition, which adds a constraint on $\tilde{p}_{AB}(a,b|x,y)$ . That is, the probability of output $a$ ( $b$ ) only relies on the input $x$ ( $y$ ) independently of the input from the other party. To be more specific, NS requires $\tilde{p}_{AB}(a,b|x,y)$ to satisfy

[TABLE]

We can follow the above derivation by imposing an additional NS constraint, which makes the problem even more complex.

Instead, we note that the CHSH inequality and the CH inequality are equivalent under NS, that is,

[TABLE]

which we refer to Appendix C.3.1 for a rigorous proof. As the CHSH inequality is defined with strong symmetry, we solve the optimization problem with the CHSH inequality. We should note that we essentially take the NS condition into account when deriving the equivalence between the CH and CHSH inequality.

Based on the general definition of Bell’s inequalities in Eq. (9.1), the coefficients of the CHSH inequality is defined by

[TABLE]

that is,

[TABLE]

When considering LHVMs strategies with imperfect input randomness, the CHSH value can be written by

[TABLE]

where

[TABLE]

Following a similar method described above, we first consider deterministic strategies, i.e., $\tilde{p}_{A}(a|x),\tilde{p}_{B}(b|y)\in\{0,1\}$ for the reason that any probabilistic LHVM could be realized with convex combination of deterministic ones. Denote $p(i,j)$ as $p_{2*i+j}$ , it is easy to show that the possible optimal deterministic strategies for $J_{\lambda}$ are

[TABLE]

and the constraints can also be described by Eq. (9.16). Following a similar argument, we only need to consider four different types strategies of choosing the input settings. Thus $J^{\mathrm{LHVM}}_{\mathrm{CHSH}}$ can be given by

[TABLE]

With a symbolic notation in Eq. (9.17), we can also present the coefficient $\beta_{ij}$ in a matrix, as shown in Table 9.4.

We solve the optimization problem in Appendix C.3.2. Based on the value of $P$ and $Q$ , we give the optimal CHSH value $J^{\mathrm{LHVM}}_{\mathrm{CHSH}}$ with LHVMs by

[TABLE]

Then the optimal CH value $J^{\mathrm{LHVM,NS}}_{\mathrm{CH}}$ with LHVMs under NS is

[TABLE]

We show the optimal value of $J^{\mathrm{LHVM,\mathrm{NS}}}_{\mathrm{CH}}$ in Fig. 9.4.

If we quantify the input randomness by its deviation from uniform distribution as defined in Eq. (9.3), the optimal CH value $J^{\mathrm{LHVM,NS}}_{\mathrm{CH}}(\delta)$ is given by

[TABLE]

The quantum bound for the CH inequality $J_{Q}$ is $(\sqrt{2}-1)/2$ , thus $\delta$ should be less than $(\sqrt{2}-1)/4\approx 0.104$ for all experiment realizations.

NS condition and factorizable

At last, we consider the probability distribution $\tilde{p}_{AB}(a,b|x,y)$ to be NS and the input randomness $p(x,y|\lambda)$ is factorizable. The optimization of the CHSH inequality is solved in Appendix C.3.3, and the result is,

[TABLE]

Then the optimal CH value $J^{\mathrm{LHVM,NS,Fac}}_{\mathrm{CH}}$ with LHVMs under NS and factorizable condition is

[TABLE]

We show the optimal value of $J^{\mathrm{LHVM,NS,Fac}}_{\mathrm{CH}}$ in Fig. 9.5.

When the input randomness is quantified as in Eq. (9.3), where $P=1/4+\delta$ and $Q=1/4-\delta$ , we have

[TABLE]

Again, it is interesting to note that the factorizable condition does not affect the optimal CH value $J^{\mathrm{LHVM,NS}}_{\mathrm{CH}}(\delta)$ when the input randomness is quantified as in Eq. (9.3).

Comparison

Let us compare the results of the CH values $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ under different conditions. For the maximal quantum violation $J_{Q}=(\sqrt{2}-1)/2$ , we calculate the critical values of $Q$ and $P$ such that $J^{\mathrm{LHVM}}_{\mathrm{CH}}(P,Q)=J_{Q}$ and plot them in Fig. 9.6. When $Q$ is small, the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}(P,Q)$ depends only on $P$ . In this case, the critical values of $P$ for the signaling, signaling+fac, NS, and NS+fac are 0.207, 0.302, 0.285, 0.354, respectively. Thus, we can see that the factorizable condition puts a stronger requirement for $P$ compared to the NS condition. On the other hand, when $Q$ is large, the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}(P,Q)$ depends only on $Q$ instead. In this case, the critical values of $Q$ for the signaling and NS condition are 0.198 and 0.146, respectively. It is interesting to note that when both $P$ and $Q$ is large, the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}(P,Q)$ is independent on the factorizable condition.

Besides, if we make use of the quantification method defined in Eq. (9.3), we have already noticed that the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}(\delta)$ is independent on the factorizable condition. Here, we compare $J^{\mathrm{LHVM}}_{\mathrm{CH}}(\delta)$ between the signaling and NS condition as shown in Fig. 9.7. For the maximal quantum violation $J_{Q}=(\sqrt{2}-1)/2$ , we calculate the critical values of $\delta$ for the signaling and NS condition to be 0.052 and 0.104, respectively.

When we quantify the input randomness as Eq. (9.3), we found that the optimal CH value is independent of the factorizable condition, Eq. (9.7). Thus with the quantification method in Eq. (9.3), one may not need to consider the factorizable condition.

For further works, it is interesting to consider joint strategies of LHVMs of the CH inequality, where the inputs of different runs are correlated. It is already shown that joint attacks to the CHSH inequality puts a very high requirement of the input randomness no matter the factorizable condition is satisfied or not [228]. In addition, we can investigate the case where the input randomness is restricted by both lower and upper bounds, $P$ and $Q$ . Due to the asymmetric definition of the CH inequality, the analysis may become extremely complicated with increasing number of correlated runs. Similar cases may also happen for other asymmetric Bell inequalities.

Furthermore, it is interesting to see whether there exist Bell’s inequalities such that the randomness requirement is very low. To do so, we have to solve the problem of optimizing the Bell value with all LHVM strategies. We expect that our derivation method could provide a general way to solve this problem.

Chapter 10 Source-independent quantum random number generation

In general, a physical generator contains two parts¡ªa randomness source and its readout. The source is essential to the quality of the resulting random numbers; hence, it needs to be carefully calibrated and modeled to achieve information-theoretical provable randomness. In this chapter, we discuss quantum random number generation without trusting its randomness source [115] hence semi-selftesting. We show in this chapter that randomness can be obtained even without characterizing its source.

10.1 The prepare and measure model

A typical QRNG can be described by the prepare and measure model that can be decomposed into two modules: a randomness source (quantum state preparation) and its readout (measurement), as shown in Fig. 10.1. In general, the source emits quantum states that are superpositions of the measurement basis. The output (raw) random numbers are the measurement results. In many QRNGs, a short random seed is required to assist state preparation or measurement.

As an example, consider a simple QRNG that projects the quantum state $\ket{+}=(\ket{H}+\ket{V})/{\sqrt{2}}$ emitted from a single photon source on the horizontal and vertical polarization basis $\{\ket{H},\ket{V}\}$ . This QRNG can be divided into two modules, as shown in Fig. 10.1(a). Randomness is guaranteed by the intrinsically probabilistic nature of quantum physics. Hereafter, we denote $\ket{H}$ , $\ket{V}$ ( $\ket{+}$ , $\ket{-}$ ) as the $Z$ -basis ( $X$ -basis) eigenstates.

Existing practical QRNGs suffer from security loopholes if the devices are not perfect. In the source readout model, the measurement device can normally be trusted due to its simple structure. For instance, in the previous example, the measurement is a simple demolition measurement on the polarization basis. In contrast, the randomness contained in a source, such as a laser or an atomic assemble, is normally difficult to characterize completely. If the source malfunctions and emits classical signals instead of quantum ones, the outputs may not be truly random. Consider the worst-case scenario in which the devices are designed or controlled by an adversary Eve. Eve can employ a pseudo-RNG to output a fixed (from Eve’s viewpoint) string that still appears random to Alice. More concretely, in the example of the previous paragraph, when a dishonest source emits $Z$ -basis instead of $X$ -basis eigenstates for the $Z$ -basis measurement, the output will just be a fixed string, as shown in Fig. 10.1(b). From this perspective, with given measurement devices, justifying the randomness in a source is crucial to generating randomness.

Most existing QRNGs use complicated physical models [234, 110] to quantify their sources. For example, the dimension of the source is sometimes assumed to be a fixed known number [235]. The underlying models implicitly assume the existence of randomness in the first place, but this assumption cannot be verified experimentally. Therefore, to achieve truly reliable randomness, there is a strong motivation to avoid the use of such models. Note that removing the dimension assumption is the key challenge to the analysis for device-independent scenarios.

Thus, a QRNG without trusting the source (source-independent) is both theoretically and practically meaningful and greatly in need. A device-independent QRNG [64] can generate randomness without having to trust the devices. This type of QRNG requires a short seed for device testing, which is the reason why they are also called randomness expansions [236, 133, 135]. By observing the violation of a certain Bell’s inequality, such as the Clauser–Horne–Shimony–Holt inequality [1], one can guarantee the presence of randomness without any assumptions about the source or the measurement device. The main drawback of device-independent QRNGs is that they are not loss-tolerant, which typically imposes very severe requirements on experimental devices. Furthermore, this type of QRNG generates random numbers at rates that are very low for practical applications. The highest speed of this type of QRNG has, so far, been reported to be 0.4 bps [75].

In this chapter, we will introduce a source-independent QRNG (SIQRNG) scheme based on the uncertainty relation that is loss-tolerant and hence highly practical. In particular, our scheme allows the source to have arbitrary and unknown dimensions. The loss-tolerance property enables potential high-loss implementations of our scheme, such as in integrated optic chips or with inefficient but cheap single photon detectors. We analyze the randomness of the scheme based on complementary uncertainty relations. Our analysis takes account of several practical issues, including finite-key-size effects, multi-photon components in the source, initial seed length, and losses. The analysis combines several ingredients from the security proof of quantum key distribution (QKD), a rich subject that has developed over the last two decades. These ingredients include phase error correction [237], random sampling [238], and the squashing model [146]. Since the squashing model shows the equivalence between threshold detectors and qubit detectors [146], our scheme allows the source to have an unfixed finite dimension as well as an infinite dimension. For simplicity, in the rest of this chapter, we assume a two-level (bit) output system. All our techniques can be directly applied to cases with more outputs.

10.2 The Protocol

10.2.1 Theoretical protocol

A schematic of our SIQRNG protocol is shown in Fig. 10.2(a). Here, we take an optical implementation as the example, as shown in Fig. 10.2(b). All our results apply similarly to other implementation systems. Quantum signals from the source first go through a modulator that actively chooses between the $X$ and $Z$ bases. Then, a polarizing beam splitter and two threshold detectors perform a projective measurement. Since two detectors are used, there are four possible outcomes: no clicks (losses), two single clicks, and double clicks. This implementation is equivalent to the schematic setup of squashing model as discussed in Section 10.2.2. The details of the protocol are presented in Fig. 10.3.

10.2.2 Analysis

In this part, we analyze the randomness output of the SIQRNG protocol. Strictly speaking, like device-independent QRNGs, our scheme is a randomness expansion scheme, in which a random seed is used to generate extra independent randomness. The procedure of parameter estimation is an analog to the phase error rate estimation in QKD postprocessing [240]. Randomness extraction is mathematically equivalent to privacy amplification in QKD. The difference between the biased measurement used here and the biased-basis choice QKD protocol [241] is that the number of $X$ -basis measurements is a constant in our case, whereas in QKD, this number must go to infinity when the data size is infinitely large.

Squashing model

In the SIQRNG scheme, we assume that measurement devices are trusted and well characterized. The key assumption here is that the measurement setup is compatible with the squashing model. That is, a measurement can be treated in two steps. First, the (unknown arbitrary-dimensional) signal state emitted from the source is projected to a qubit or vacuum. The projection is called a squasher, as shown in Fig. 10.2(a). Then, the squashed qubits are post-selected by discarding the vacua and measuring them in the $X$ or $Z$ basis. This assumption can be satisfied when threshold detectors are used with random bit assignments for double clicks [146]. For the protocol described in Section 10.2, the $X$ -basis measurement results are used for parameter estimation and are then discarded in postprocessing. Thus, the random assignment can be replaced by adding half of the double-click ratio to the $X$ -basis error rate.

In practice, it is a challenge to verify whether a measurement setup is compatible with the squashing model. Much effort has been put into this question [242]. The key point here is to make the two detectors respond equally to (four) different qubits, and hence make the measurement device basis-independent [243]. This can be done by adding a series of filters (including spectrum and temporal filters) before the threshold detectors, to ensure that the input states stay within a proper set of optical modes [244], in which the detectors have the same efficiencies [146, 243]. One can further assume that Alice uses a trusted source to calibrate the measurement devices beforehand; that is, Alice performs a quantum measurement tomography. A similar measurement calibration procedure should be done in most current QKD and QRNG realizations. Here, we emphasize that the verification of the squashing model does not affect the source-independent property of our scheme. Thus, we leave detailed investigation on validating the measurement setting for future works.

Similar to the QKD case [146], we can assume that the squashing operator is held by Eve in the randomness analysis. By this, we mean that Eve can choose a valid operator, so long as the output is a qubit or a vacuum. In the following discussions, we focus on the squashed qubits. We need to determine the min-entropy associated with these qubits in the $Z$ -basis measurement.

Complementary uncertainty relation

First, we show intuitively why the protocol works. According to quantum mechanics, the outcome of projecting the state $\ket{+}$ on the $Z$ basis is random. Of course, in reality, due to device imperfections, Alice would never obtain a perfect state of $\ket{+}$ . Now, the key question for Alice becomes how to verify that the source faithfully emits the state $\ket{+}$ . This can be done by borrowing a similar technique from the security analysis of QKD [245, 237, 246] and consider an equivalent virtual protocol depicted in Fig. 10.4, where we replace steps $5$ and $6$ by $5^{\prime}$ and $6^{\prime}$ . In steps $3$ and $4$ of the protocol, Alice occasionally performs the $X$ -basis measurement and define the phase error rate to be the ratio of detecting $\ket{-}$ . In the virtual protocol, once Alice knows the phase error rate by random sampling tests, she can perform a phase error correction (step $5^{\prime}$ ) before the final $Z$ -basis measurement (step $6^{\prime}$ ). By an smart design of the phase error correction procedure [237], Alice can make it commute with the $Z$ -basis measurement. Thus, she can perform the $Z$ -basis measurement (step $5$ ) first and then apply randomness extraction (step $6$ ). At this stage, all the states have already collapsed to classical results, and the phase error correction procedure becomes randomness extraction (or privacy amplification in QKD) [245, 237, 246]. Besides QKD, the argument here is similar to the one used in Ref. [21], where one can consider the error correction process $5^{\prime}$ as distilling coherence or randomness extraction.

It has been proved that the phase error correction (randomness extraction) can be efficiently done with Toeplitz-matrix hashing [239]. Suppose the number of qubits measured in the $Z$ basis is $n_{z}$ and the phase error rate is $e_{pz}$ , the number of bits sacrificed in the phase error correction is given by

[TABLE]

and the probability that the phase error correction fails is $2^{-t_{e}}$ [240]. Here, $H(e)=-e\log e-(1-e)\log(1-e)$ is the binary Shannon entropy function, all the $\log$ is base 2 throughout this chapter, and $t_{e}$ is the parameter Alice picks up by balancing the failure probability and the final output length. Then, the number of final random bits is given by,

[TABLE]

In practice, Alice needs to prepare a Toeplitz matrix of size $n_{z}\times[n_{z}-n_{z}H(e_{pz})-t_{e}]$ for randomness extraction.

We note that the failure probability $2^{-t_{e}}$ quantifies fidelity between the state that results from the phase error correction and the ideal state $\ket{+}^{\otimes n_{z}}$ . In the composable security definition [157, 158], a trace-distance measure security parameter $\varepsilon_{t}$ should be employed. Its relation to the fidelity measure $\varepsilon_{f}$ is given by [238]

[TABLE]

In the following, we shall use the fidelity measure for the failure probability, which, in the end, can be conveniently converted to the trace-distance measure security parameter.

To construct the Toeplitz matrix of size $n_{z}\times[n_{z}-n_{z}H(e_{pz})-t_{e}]$ , Alice needs to use $n_{z}+n_{z}-n_{z}H(e_{pz})-t_{e}-1$ random bits. Thanks to the Leftover Hash Lemma [247], the Toeplitz hashing extractor can be proven to be a strong extractor. That is, the output random bits are independent of the random bits used in the construction of the Toeplitz matrix [248]. Thus, the Toeplitz matrix can be reused.

Our result can also be derived via a different but elegant approach by employing a newly developed seminal uncertainty relation [249] and extending the Leftover Hash Lemma [247] to the quantum scenario [250]. Interestingly, the result from that approach yields a security parameter (in trace distance measure) that is of the order of $2^{-t_{e}/2}$ , which is consistent with ours. Such techniques have been successfully applied in some applications, including QRNGs [235].

Finite key analysis

In practice, the QRNG only runs for a finite time; consequently, the sampling tests for the $X$ -basis measurements will suffer from statistical fluctuations. In the parameter estimation step, the key parameter $e_{pz}$ in Eq. (10.2) should be estimated (bounded) from the finite data size effect.

In the random sampling test, Alice measures the squashed qubits in the $X$ basis and obtains the error rate, $e_{bx}$ . Remember that, as required in the squashing model, this error rate includes half of the double-click ratio. Henceforth, we simply call this error rate as the $X$ -basis error rate. Recall that the phase error rate $e_{pz}$ is defined as the error rate if the quantum signals measured in the $Z$ basis were measured in the $X$ basis. When the sampling size is large enough, $e_{pz}$ can be well approximated by $e_{bx}$ .

Before presenting the details of the random sampling analysis, we establish a notation. Suppose Alice receives $n$ squashed qubits and randomly chooses $n_{x}$ of them to be measured in the $X$ basis, leaving the remaining $n_{z}=n-n_{x}$ qubits in the $Z$ basis. Let the ratio of $X$ -basis measurements be $q_{x}=n_{x}/n$ , the number of errors Alice finds in the $X$ basis to be $k$ , and the total number of errors to be $m$ if Alice had measured all qubits in the $X$ basis. Then, the number of errors in the qubits measured in the $Z$ basis is $m-k$ , which is the key parameter we need to determine through random sampling. The quantity $m-k=n_{z}e_{pz}$ determines the randomness extraction rate. Define the lower bound of $e_{pz}$ by,

[TABLE]

where $\theta$ is the deviation due to statistical fluctuations. Following the random sampling results of Fung et al. [238], we can bound the probability when Eq. (10.4) fails,

[TABLE]

where $\xi(\theta)=H(e_{bx}+\theta-q_{x}\theta)-q_{x}H(e_{bx})-(1-q_{x})H(e_{bx}+\theta)$ . Note that in the unlikely event that $e_{bx}=0$ , the failure probability is unbounded, and one should rederive the failure probability or simply replace $e_{bx}$ with a small value, say, $1/n_{x}$ .

In practice, the failure probability $\varepsilon_{\theta}$ is normally picked to be a small number depended on applications. In later data postprocessing, we pick up $\varepsilon_{\theta}=2^{-100}$ . Once $\varepsilon_{\theta}$ is fixed, there is a trade-off between $q_{x}$ and $\theta$ for the ratio of the final random bit length over the raw data size. Thus, the number of samples for the $X$ -basis measurement should be optimized for the randomness extraction rate.

One key property for the random sampling is that the $n_{x}$ locations of the $X$ -basis measurements are randomly chosen from the total $n$ locations, i.e., the $\binom{n}{n_{x}}$ cases occur equally likely. Then, Alice needs a random seed with a length of

[TABLE]

The effect of loss on the seed length will be discussed in Section 10.2.2. In Appendix B.1, we show that $n_{x}$ can remain a constant, given the failure probability, when $n$ is large. Then, in the large data size limit, the seed length is exponentially small compared to the length of the output random bit. Therefore, we reach an exponential randomness expansion.

Practical issues

Multi photons: In our protocol, the source is allowed to emit multi photons, since its dimension is assumed to be uncharacterized. In other words, these components would not affect the randomness of the final output. In practice, multi photons may introduce double clicks when threshold detectors are used [146]; these double clicks will directly contribute to the error rate term $e_{bx}$ . Thus, when the multi-photon ratio is very high, the double-click ratio will increase to a point that the upper bound on information leakage $e_{pz}$ increases to oen half; at that point, when no random bits can be extracted according to Eq. (10.2) and Alice simply aborts the protocol.

Loss: The loss tolerance of our protocol is guaranteed by the squashing model in which the measurement is assumed to be basis independent [146]. This assumption can be guaranteed by the fact that the basis is chosen after losses. Alice does not anticipate the positions of losses, so she effectively decides the (random) positions for $X$ -basis measurements before losses. The effect of loss only decreases the number of effective $X$ measurements, but the positions of effective $X$ measurements are still uniformly random in squashed qubits; this fulfills the requirement of random sampling. The detailed proof is shown in Appendix B.2.

Basis-dependent detector efficiency: Our protocol assumes that the efficiencies of the detectors are the same. In practice, efficiency mismatches would cause the measurement to be different for the two bases (basis dependent). A viable way to deal with this imperfection is to recalculate the rate as a function of the ratio between the efficiencies of the two bases, employing the technique used in QKD [243]. As indicated by the result in QKD [243], the random number generation rate will slightly decrease when there is a small mismatch in detector efficiencies. More precisely, denote the ratio between the minimum and maximum efficiencies of the two detectors as $r\leq 1$ , then the key size becomes $rn_{z}(1-H[(e_{bx}+\theta)/r])-t_{e}$ bits. We leave detailed analysis of this imperfection for future work.

Double clicks: Our analysis takes account of the effect of double clicks by adding half of the double-click ratio to the $X$ -basis error rate, as required in the squashing model. This is also essentially why multi-photon states can be used on the source side without affecting final randomness. Note that double clicks should not be discarded freely in the measurement. Otherwise a security loophole will appear, namely, a strong pulse attack [251]. In a strong pulse attack, Eve always sends strong signals (with many photons) in the $Z$ basis. Suppose she sends a strong state in $\ket{H}$ ; if Alice chooses the $Z$ -basis measurement, a valid raw random bit will be obtained, but if she chooses the $X$ basis, a double click is likely to happen. In our protocol, when Alice chooses the $X$ -basis measurement, she should get an error (resulting in $\ket{-}$ ) with a probability of one half. If Alice simply discards all double clicks, Eve’s attack will not be noticed. This attack cannot be explained by a qubit measurement. This is intuitively why the squashing model requires random assignments for double clicks.

Basis choice: When choosing $X$ - or $Z$ -basis measurements, an input random string of length $N$ (as a seed) is needed to choose the basis. Suppose the number of $X$ -basis measurements to be performed is $N_{x}$ , then Alice chooses $N_{x}$ positions out of $N$ with equal probability, i.e., with probability $\binom{N}{N_{x}}^{-1}$ . Then, she needs a seed length of $\log\binom{N}{N_{x}}$ . This is similar to Eq. (10.6) with the difference that before the measurement, Alice does not know the positions of losses. More details on how to dilute a short random seed to a longer (partially random) one are provided in Appendix B.3.

Intensity optimization: The intensity of the source should be optimized to maximize the randomness generation rate. With increasing intensity, the detection rate will increase along with an increases in the double-click rate (and hence $e_{pz}$ increases). There exists a trade-off between $n_{z}$ and $e_{pz}$ , as shown in Eq. (10.2).

10.3 Experiment demonstration

In this section, we perform a proof-of-principle experimental demonstration to show the practicality of SIQRNG scheme. Our experiment setup consists of two parts, the source, owned by an untrusted party Eve, and the measurement device, owned by the user Alice. The schematic diagram is shown in Fig. 10.5.

On Eve’s side, a laser, labeled as $S$ , with a wavelength of 850 nm and a repetition rate of 1 MHz is used as a photon source. The power of the laser is adjusted to be one photon number per pulse. Instead of assuming each state to be a qubit system, each pulse that the laser sends is a coherent state of infinite dimensions. The pulse of the laser is then modulated to $\ket{+}$ polarization by a linear polarizer (LP) and a fiber polarization controller (FPC1). Between the source and the measurement device, we put an fiber attenuator (FA) to simulate different losses in the system.

On Alice’s side, first a series of filters need to be applied to ensure the measured optical mode is pure before entering the threshold detectors, as required by the squashing model. For demonstration purpose, we use a single-mode fiber to play the role of a filter. Ideally, frequency and temporal filters should be also added to further purify the optical mode in order to make the photons indistinguishable. For demonstration purpose, a biased beam splitter (BS1) with a ratio of $1:49$ is used to passively choose the $X$ or $Z$ basis. Finally, Alice records when the photon detector (PD) clicks. The detector is time-division-multiplexed by adding four time delays TD1 to TD4 (60 ns each) in the optical paths, so that it can simulate four detectors which detect the outcomes of both bases and each bit values. The gate width and the dead time of the detector are 10 ns and 50 ns, respectively.

The phase error rate, as calculated in Eq. (10.4), is plotted in Fig. 10.6. The related experimental parameters are listed as follows. The raw key sizes is $N=10^{6}$ ; the dark count is $10^{-5}$ ; the detector efficiency (without FC adaptor) is $45\%$ ; the misalignment error of the source is 2%; and the failure probability is $\varepsilon_{\theta}=2^{-100}$ . The figure shows that the error rate increases as the loss becomes large. This is because the effect of dark counts becomes dominant when the loss is high. Due to statistical fluctuations, the phase error rate increases when the data size shrinks. Note in particular that the phase error rate can go beyond $20\%$ under high losses, which does not yield any key rates in most QKD protocols. Nevertheless, random numbers can still be generated in our SIQRNG scheme.

The relation between the randomness generation rate and the loss is plotted in Fig. 10.7. It can be seen that the randomness generation rate becomes lower with a larger loss, which is consistent with Fig. 10.6. Under practical detector efficiency, the randomness generation rate still achieves a relatively high rate of $5\times 10^{3}$ bit/s. Note that, the intensity of the source is fixed in our experimental demonstration. In practice, the intensity of the source can be increased to compensate the loss, and actually the maximum randomness generation rate in our scheme is mainly limited by the dead time of the detector. For our detector with a dead time of 50 ns, the maximum randomness generation rate is $1$ bit $/50$ ns=20 Mbps, which requires the source to be a single photon source with a repetition rate of 20 Mbps. For practical implementations with coherent-state sources, the randomness generation rate can reach the order of 2 Mbps after taking account of various errors and finite data size effects.

After obtaining the random bits, we apply the Toeplitz-matrix hashing [239] on the raw data to obtain final random numbers. To test the randomness, we further perform two statistical tests on the output of our SIQRNG, autocorrelation test and the NIST test suite 222See http://csrc.nist.gov/groups/ST/toolkit/rng.. The autocorrelation is defined as

[TABLE]

where $j$ is the lag between the samples, $X_{i}$ is the $i$ -th sample bit, $\mu$ and $\sigma$ are the average and the variance of the sample, and $\mathbb{E}$ stands for expectation. The result of autocorrelation test of raw data and final data is shown in Fig. 10.8. It can be seen that the autocorrelation is substantially reduced in the final data.

The result of NIST tests on the final data is shown in Fig. 10.9. We can see that all tests are passed.

We have proposed a source-independent and loss-tolerant QRNG scheme and its experimentally demonstration in a passive basis choice realization. From an experimental point of view, the beam splitter itself, as a part of the measurement device, may also be uncharacterized. Thus, it would also be interesting to demonstrate our scheme with an active basis choice in the future. In fact, when the source operates properly, the speed of our protocol is comparable to that of a trusted polarization-based QRNG whose frequency is limited only by single photon detectors—approximately 100 Mbps [252].

Some current realizations of QRNG experiments could be converted to our SIQRNG protocol. For example, an LED could be used as the source, as regular QRNG [123]. Since the polarizations of an LED light are random, it would be convenient to add a polarizer for the $\ket{+}$ direction to make the source polarized light. Since the detector can work in a gated mode, it does not matter whether the light source is continuous or pulsed. This shows why the repetition rate is limited only by single photon detectors. Viewed from another angle, such a setup could also be used to test quantum features of macroscopic sources.

Recently, a continuous-variable version of the source-independent QRNG is experimentally demonstrated[116] and achieves a randomness generation rate over 1 Gbps. Moreover, with state-of-the-art devices, it can potentially reach the speed in the order of tens of Gbps, which is similar to the trusted-device QRNGs. Hence, semi-self-testing QRNG is approaching practical regime.

Apart from the protocol based on uncertainty relation, we can also make use of the coherence on the measurement basis to quantify the randomness output. Note that, the SI-QRNG protocols based on uncertainty relation does not maximally exploits the randomness in the source. For instance, suppose the source emits state $1/2(\ket{0}+i\ket{1})$ , then we have $H(X)=1$ and hence $H(Z|E)\geq 0$ . In this case, although the measurement outcome on the $Z$ basis is genuinely random, it cannot be revealed by the information on the complementary basis $X$ . Instead, the genuine randomness can be extracted if the $Y=\{\ket{+i}=(\ket{0}+i\ket{1})/\sqrt{2},\ket{-i}=(\ket{0}-i\ket{1})/\sqrt{2}\}$ basis is measured. On the other hand, if the $Y$ basis is also measured, we can directly calculate the coherence of the state and the randomness output will be maximized.

Part V Other works

Chapter 11 Open timelike curves

In general relativity, closed timelike curves (CTCs) can break causality with remarkable and unsettling consequences. At the classical level, they induce causal paradoxes disturbing enough to motivate conjectures that explicitly prevent their existence. At the quantum level such problems can be resolved through the Deutschian formalism, however this induces radical benefits¡ªfrom cloning unknown quantum states to solving problems intractable to quantum computers. Instinctively, one expects these benefits to vanish if causality is respected. This chapter shows that in harnessing entanglement, we can efficiently solve NP-complete problems and clone arbitrary quantum states¡ªeven when all time-travelling systems are completely isolated from the past [253]. Thus, the many defining benefits of Deutschian closed timelike curves can still be harnessed, even when causality is preserved.

11.1 Open timelike curves

11.1.1 Causality and CTC

Causality aligns with our natural sense of reality. We expect there to be a natural chronology to our reality - two events should not be simultaneous causes for each other. The breaking of causality defies classical logic, resulting in causal paradoxes with no simple solution - the iconic example being the case where a man travels back in time to kill his own grandfather. Thus, physical predictions that break causality face intense scrutiny - often considered to be theoretical artifacts that are likely suppressed once we gain a more complete understanding of reality - motivating various chronology protection conjectures [254].

Nevertheless, causality breaking theories are consistent with current scientific knowledge. Closed timelike curves (CTCs) are valid solutions of Einstein’s equations in general relativity [255, 256, 257]. Meanwhile, Deustch demonstrated that in the quantum regime, the resulting causal paradoxes always have self consistent solutions [258]. This resolution, however, has radical operational consequences. Many foundational constraints of quantum theory break. Non-orthogonal quantum states can be perfectly distinguished, the uncertainty principle can be violated, and arbitrary unknown quantum states can be cloned to any fixed fidelity [259, 260]. In harnessing these effects, many problems thought to be intractable to standard quantum computers now field efficient solutions [261, 262, 263, 264]. Though radical, these effects seem somewhat rationalized in the context of requiring broken causality - the sentiment being that they are curiosities that will vanish once causality is imposed.

What happens, however, if causality is not strictly broken? In this context, Pienaar et.al introduced open timelike curves [265] (OTCs). Consider a particle that travels back in time with respect to a chronology respecting observer, but is completely isolated from anything that can affect its own causal past during the time-traveling process (See Fig. 11.1). While the time-traveling particle has the potential to break causality, its complete isolation ensures that causality never breaks. Nevertheless, such OTCs can violate uncertainty principles between position and momentum. This opens a remarkable possibility - could the many other radical effects of CTCs stand independent from the breaking of causality?

Here, we demonstrate that OTCs are remarkably powerful, and can replicate many defining operational benefits of CTCs. In sending a particle back in time - even when it interacts with nothing in the past - we can clone arbitrary quantum states to any fixed accuracy, and thus violate any uncertainty principle. Meanwhile, they also grant quantum processors additional computational power, allowing efficient solution of NP-complete problems. Our results hint that the remarkable power of Deustchian CTCs may survive the censorship of chronology protection. This drastically improves the potential of harnessing such power via alternative effects - such as certain models of gravitational time dilation [265]. Thus, we open the possibility of testing the many radical protocols that harness CTCs in significantly less controversial settings.

11.1.2 OTC and CTC

In general relativity, causality can be violated due to the presence of spacetime wormholes that facilitate closed timelike curves (see Fig. 11.1). This allows a physical system $A$ to travel into its own causal past, and interact with its past self via some unitary $U$ . The Deutschian model resolves potential paradoxes by enforcing temporal self-consistency [258, 266], i.e.,

[TABLE]

where $\rho^{\mathrm{(in)}}$ denotes the initial state of the system, $\rho_{\mathrm{CTC}}$ is the state it evolves to at the point of wormhole traversal and $\mathrm{Tr}_{\neq A}$ represents tracing over all systems other than $A$ . Given a solution for $\rho_{\mathrm{CTC}}$ , the final output of the process is given by

[TABLE]

The many radical effects of CTCs rely on using specific self-interactions $U$ to break causality in different ways [259, 260, 261, 262, 263].

Note that while the above analysis does not assume $\rho^{\mathrm{(in)}}$ is pure, it only applies to mixed inputs if $\rho^{\mathrm{(in)}}$ represents one partition of a larger composite system that is pure. In the scenario where an input $\ket{\phi_{k}}$ is prepared with probability $p_{k}$ , the dynamics of the CTC on each $\ket{\phi_{k}}$ must be analyzed separately [258]. This is due to non-linearity, which implies different unravellings of the density operator yield differing outputs.

In OTCs, causality is preserved. The unitary $U$ is the identity - such that the time-travelling system does not interact with its causal past. Any observer in the frame of reference of $A$ can assign a valid chronology to all the events they witness. Meanwhile, to any outside observer, all events involving interactions with $A$ will respect causality. From an operational standpoint, there is no breaking of causality. If all information were classical, this entire procedure would only have the effect of desynchronizing $A$ ’s clock with that of an observer $B$ .

Non-trivial effects, however, emerge when we consider quantum ancilla. Suppose we have access to a bipartite system $AB$ in state $\rho_{AB}$ , where only one bipartition is sent through the OTC (see Fig. 11.2b). The self-consistency relations imply

[TABLE]

Thus, the OTC acts as a universal decorrelator on $A$ - in sending a system $A$ though an OTC, we erase all quantum correlations between $A$ and the rest of the universe (and in particular, $B$ ). The resulting state, $\rho_{A}\otimes\rho_{B}$ fields identical local statistics with respect to the input $\rho_{AB}$ , but none of its bipartite correlations. While this operation appears similar to trivial decoherence, it is non-linear, and shown to be impossible to synthesize with standard quantum dynamics [267].

This effect is associated with the monogamy of entanglement [266] - a particle and its past self cannot be simultaneously entangled with the same external ancilla. While OTCs produce non-trivial dynamics when the input appears completely classical (e.g., when $\rho^{\mathrm{(in)}}_{AB}=(\ket{00}\bra{00}+\ket{11}\bra{11})/2$ ), it applies only for mixed inputs if this mixedness is due to entanglement with some other system $C$ . If we input $\ket{00}$ and $\ket{11}$ with equiprobability, then the dynamics of each input must be analyzed separately, and the OTC will have no effect.

11.2 Quantum information with OTC

11.2.1 OTC enhanced measurement

We first introduce OTC enhanced measurement, a procedure that harnesses OTCs to measure an arbitrary observable $\hat{O}$ to any fixed precision. Specifically, given an unknown qudit ( $d$ dimensional quantum system) in state $\rho$ , we can determine $\langle\hat{O}\rangle=\mathrm{Tr}[\hat{O}\rho]$ to any desired accuracy $\delta>0$ with negligible failure probability. This protocol functions as a building block for more sophisticated applications of OTCs, such as the efficient solution of NP-complete problems and cloning of unknown quantum states.

The protocol is illustrated in Fig. 11.3. Let $\ket{j}:j=0,1,\dots,d-1$ denote a basis that diagonalizes $\hat{O}$ . On this basis, we introduce the two qudit controlled addition operator, $C_{+}\ket{i}\ket{j}=\ket{i}\ket{j+i}$ , where addition is done modulo $d$ . We then

Prepare $N$ identical ancillary states in an eigenstates of $\hat{O}$ , say $\ket{0}$ . 2. 2.

Apply the $C_{+}$ operations $N$ times, each controlled on $\rho$ and targeting a fresh ancilla state. This correlates $\rho$ with each of the $N$ ancillaries. 3. 3.

Pass each of the ancillaries through an OTC to destroy all correlations in this $N+1$ -partite system.

This results in $N+1$ uncorrelated qudits, each in state $\rho_{\mathrm{diag}}=\sum_{i=1}^{d}\rho_{ii}\ket{i}\bra{i}$ , where $\rho_{ii}$ are the diagonal elements of $\rho$ in the $\hat{O}$ basis. Thus, each qudit exhibits identical statistics to $\rho$ when measured in the $\hat{O}$ basis. In taking the mean of these measurements, we obtain an estimate for $\langle\hat{O}\rangle$ . By the central limit theorem, the error of our estimate scales linearly with $1/\sqrt{N}$ . In particular, provided the eigenvalues of $\hat{O}$ are bounded, Hoeffding’s bound implies we can estimate $\hat{O}$ to any desired accuracy $\delta$ and error rate $\epsilon$ using $O[1/\delta^{2}\log(1/\epsilon)]$ OTCs (see methods for details).

Scaling Analysis

Execution of the OTC enhanced measurement with $N$ ancillaries (and therefore $N$ uses of the OTC) to estimate $\langle\hat{O}\rangle$ gives an output $O_{\mathrm{est}}=\sum_{k}O_{k}/(N+1)$ . We define the measurement as being successful if the estimate achieves a desired accuracy of $\delta$ (i.e., $|O_{\mathrm{est}}-\hat{O}|<\delta$ ). Application of Hoeffding’s inequality [268] gives failure probability $p_{f}$ that obeys

[TABLE]

Here, $O_{\mathrm{max}}$ and $O_{\mathrm{min}}$ are the respective maximum and minimum eigenvalues of $\hat{O}$ . Therefore

[TABLE]

OTC applications ensures a failure probability of no more than $\epsilon$ . Provided $\hat{O}$ is bounded, this scales as $O[1/\delta^{2}\log(1/\epsilon)]$ .

In OTC assisted cloning, we need to make $d^{2}$ informationally complete measurements, each to a desired accuracy $\delta>0$ with negligible failure probability $\epsilon>0$ . Recall this is achieved via a $1\rightarrow d^{2}$ universal cloner, whose imperfect copies are to be decorrelated via the use of OTCs (Fig. 11.5). For each of the $d^{2}$ copies, we apply an OTC enhanced measurement. To ensure this measurement is within accuracy $\delta$ , an extra $O(d^{2})$ overhead is required to compensate for the noise within the imperfect copies. The total number of OTCs required is then of order

[TABLE]

where $O_{\mathrm{max}}-O_{\mathrm{min}}$ is equal to 1 for members of the information complete basis.

11.2.2 Solving NP-complete problems

We take inspiration from Bacon [262], who devised an efficient algorithm to solve the boolean satisfaction problem - a known NP-complete problem - using CTCs.

Bacon’s protocol

Here we outline explicitly how a non-linear map that takes $\rho(n_{z})$ to $\rho(n_{z}^{2})$ allows the efficient solution of NP-complete problems. Specifically we study the satisfaction problem: Given a Boolean function $f:\{0,1\}^{n}\rightarrow\{0,1\}$ , specified in conjunctive normal form, does there exist a satisfying assignment $(\exists b|f(b)=1)$ ? This problem is known to be NP-complete.

Bacon [262] showed that this problem can be efficiently solved if when given an input qubit $\rho=(I+\vec{n}\cdot\vec{\sigma})/2$ , we can synthesize a quantum gate $S$ such that $S(\rho)=\frac{1}{2}\left(I+n_{z}^{2}\sigma_{z}\right)$ . Here $\vec{n}$ denotes the Bloch sphere vector, and $\vec{\sigma}$ is a 3 component vector of Pauli matrices. In Fig. 11.4 we demonstrated how this gate can be synthesized using OTCs. With this established, the satisfaction problem is efficiently solved as follows:

Prepare $n$ ancillary qubits in the state $1/\sqrt{2^{n}}\sum_{i=0}^{2^{n}-1}\ket{i}$ and a target qubit in state $\ket{0}$ . 2. 2.

Apply the unitary

[TABLE]

on this system (with the last qubit representing the target). Tracing out the ancillary qubits leaves the target in

[TABLE]

where $s$ is the number of $x$ satisfying $f(x)=1$ . 3. 3.

Apply $S$ to the target via the use of OTCs (See Fig. 11.4). Repeat this step $p$ times to get

[TABLE]

Notice that, we could easily check the case of $s=2^{n}$ . Thus we only need to distinguish between $s=0$ and $0<s<2^{n}$ . With the limit of $p\rightarrow\infty$ , the two output states corresponding to the cases of $s=0$ and $0<s<2^{n}$ are $\rho_{p}=\ket{0}\bra{0}$ and $\rho_{p}\rightarrow I/2$ , respectively. By performing measurement in the $\sigma_{z}$ basis, one can distinguish the two types of output states $\ket{0}\bra{0}$ and $I/2$ , that is, the case of $s=0$ and $0<s<2^{n}$ , with failure probability being $1/2$ . By repeating these steps more times, say, $q$ , the failing probability exponentially decays. For finite $p$ and $q$ that are polynomial in $n$ , the probability of failure is given by [262]

[TABLE]

Solving NP-complete problems with OTC

We modify this algorithm to preserve causality - without losing efficiency. In the causality breaking algorithm, the key role of CTCs is to implement the non-linear map $S$ that maps an input qubit in state $\rho(n_{z})$ to an output state $\rho(n_{z}^{2})$ , where $\rho(n_{z})=\frac{1}{2}\left(I+n_{z}\sigma_{z}\right)$ and $\sigma_{z}=\ket{0}\bra{0}-\ket{1}\bra{1}$ denotes the Pauli $Z$ matrix (see methods for details).

This non-linear map can be replicated without breaking causality (See Fig. 11.4). Consider a special case of OTC enhanced measurement, with $\sigma_{z}$ as the observable of interest and a single ancilla. For the input qubit $\rho$ with matrix elements $\rho_{ij}$ , application of the enhanced measurement protocol outputs two uncorrelated qubits, each in state $\rho_{\mathrm{diag}}=\rho_{00}\ket{0}\bra{0}+\rho_{11}\ket{1}\bra{1}$ . Instead of measuring each in $\sigma_{z}$ directly, we apply a further $C_{+}$ gate controlled on the ancilla. After discarding the ancilla, the input qubit is now transformed to $S(\rho)$ as required.

In generating $S(\rho)$ using only OTCs, we can translate Bacon’s algorithm into one that does not break causality. We note that as each call of $S(\rho)$ only takes one OTC, the translation from CTCs to OTCs incurs no overhead on the number of times a particle needs to be sent through a spacetime wormhole. Thus, for the purpose of solving NP-complete problems, an OTC, together with one bit of entanglement, is at least as a powerful as a CTC.

11.2.3 Cloning with OTCs

Given an unknown input $\rho$ , OTCs allow us to generate an unlimited number of clones to arbitrary fidelity. Our approach harnesses OTC enhanced measurements as a subroutine, which allows us to accurately determine $\mathrm{Tr}[M_{i}\rho]$ , for any observable $M_{i}$ . First, observe that this remains possible even if we are supplied with

[TABLE]

a very noisy version of $\rho$ . Here $I$ is the $d$ -dimensional identity matrix, and $s$ is some fixed parameter such that $0<s<1$ .

This observation, together with imperfect quantum cloners, form the basis of OTC enhanced cloning (Fig. 11.5). In conventional quantum theory, an unknown quantum state $\rho$ can be cloned if we are given sufficiently many copies to perform accurate tomography [269]. One way to do this, is to use a set of $O(d^{2})$ informationally complete measurements $\{M_{i}\}$ , whose expectation values $\mathrm{Tr}\left[M_{i}\rho\right]$ has a one-to-one correspondence with the classical matrix description of $\rho$ . Given only a single copy of $\rho$ , this option is no longer valid. Recently, Brun et.al demonstrated that close timelike curves circumvent this restriction, and allows the estimation of each $\langle M_{i}\rangle$ to any desired accuracy [260].

OTC enhancement measurements can replicate this effect while preserving causality. We use standard methods to construct $O(d^{2})$ imperfect clones in the form of Eq. 11.11, where $s$ scales as $1/d$ for an optimal cloner [270]. Each clone is passed through an OTC to remove all entanglement between clones. An OTC enhanced measurement is then performed on each clone with respect to a different $M_{i}$ . The outcomes of these measurements determine the density matrix of $\rho$ . In methods, we show that by using $O(d^{4}/\delta_{c}^{2}\log{1/\epsilon_{c}})$ OTCs, we can ensure that each $\langle M_{i}\rangle$ is obtained to an accuracy of $\delta_{c}$ with failure probability $\epsilon_{c}$ .

A Simple Example

We illustrate these ideas by cloning a qubit. Here, the Pauli operators $\sigma_{k},k=x,y,z$ is informationally complete - any $\rho$ is uniquely defined by the expectation values $n_{k}=\mathrm{Tr}[\sigma_{k}\rho]$ . To determine each $n_{k}$ , we first apply a universal 1-to-3 quantum cloner to obtain three imperfect clone of $\rho$ , each in state $\rho^{\prime}=(I+s\vec{n}\cdot\vec{\sigma})/2$ with $s=5/9$ [271]. These copies can be made independent via OTCs.

An OTC enhanced measurement of $\sigma_{z}$ is then performed on one such imperfect clone. We initialize $N$ ancilla qubits in state $\ket{0}$ , and apply a CNOT gate between each ancilla and the clone (with the clone as the control qubit). In erasing the resulting correlations by sending each clone through an OTC, we obtain $N+1$ qubits, each in the state $(I+sn_{z}\sigma_{z})/2$ . Provided $N$ is sufficiently large, measurement of these qubits allows $n_{z}$ to be determined to any desired accuracy with negligible error. Repetition of this process with $\sigma_{x}$ and $\sigma_{y}$ on the two remaining imperfect clones then yields complete information about $\rho$ .

Our result highlights the intricate interplay between quantum theory and general relativity. If all physical systems have a well defined local reality, open timelike curves would have no operational effect. Our protocol thus fields no classical explanation. In the quantum regime, however, entanglement exists. While the local properties of a physical system are unaffected by open timelike curves, their correlations with other chronology respecting systems are complete erased. In each of our protocols, this effect played a central role, allowing us to replicate the many defining benefits of close timelike curves. This remarkable success propels us to conjecture whether entanglement assisted open timelikes curves are operationally equivalent to their causality breaking counterparts. Could one, for example, derive a map that takes any quantum circuit with CTCs, and engineer it in a way that does not break causality?

Preserving causality has significant benefits. Breaking causality is likely to be non-trivial, and opportunities to do so are negligible in the foreseeable future. This makes it unlikely for us to directly test the predictions of Deutschian CTCs. The preservation of causality in OTCs, however, suggest that its non-linear effects may be synthesized using alternative means. For instance, from the perspective of a chronology respecting observer, a particle sent through an OTC exhibits nothing more than time delay. Thus, in order to reconcile quantum field theory with non-hyperbolic space-times, gravitational time-dilation has been conjectured to share similar operational effects as OTCs [272]. If true, our protocols suggest the exotic benefits of quantum processing in the general relativistic regime can be tested much sooner than previously expected.

Chapter 12 Quantum theory from axioms

Quantum Information provided a new angle on the foundations of Quantum Mechanics, where the emphasis is placed on operational tasks pertaining information-processing and computation. In this spirit, several authors have proposed that the mathematical structure of Quantum Theory could (and should) be rebuilt from purely information-theoretic principles. This chapter reviews the particular route proposed by D’Ariano, Perinotti, and Chiribella [273, 274, 275] and probes its application in nonlocality and contextuality [276].

12.1 Axiomization of quantum theory

Quantum mechanics has been one of the greatest scientific breakthroughs of the 20th century, and at the beginning of the 21st it still provides the most accurate predictions about the microscopic world and a key to new exciting discoveries. A young branch of quantum mechanics is quantum information science. Over the past three decades, this new field brought to light a wealth of operational consequences of the mathematical structure of quantum theory: no-cloning [277], quantum teleportation [7], dense coding [54], quantum key distribution [6, 8], and quantum algorithms [10, 9] are just a few examples showing that quantum theory entails a powerful and totally new model of information processing.

Which are the conceptual ingredients of this power? What makes the quantum model special with respect to the classical one or to other more exotic models that we can conceive? Stimulating new questions are the best route to answering older, long-standing questions. Inspired by the surprising features of quantum information, many researchers—notably Fuchs [278] and Brassard [279]—suggested that the whole structure of quantum theory, with its dowry of Hilbert spaces and operator algebras, could be reconstructed from a few simple principles of information-theoretic nature. Reconstructing quantum theory means rigorously proving that, once we define a sufficiently general class of possible theories, quantum theory is the only one compatible with the principles. Many proposals of reconstructions of quantum theory have been made, in different frameworks and with emphasis on different features [274, 280, 281, 282, 283, 284, 285, 286], bringing a breeze of fresh air on the long standing problem of a conceptual axiomatization of quantum mechanics. Here we will focus on the reconstruction of Ref. [274], which presented a new set of information-theoretic principles directly inspired by quantum information.

The structure of Ref. [274] is devised to highlight one particular feature as the ingredient that generates the surprising features of quantum theory: the ingredient is the purification principle [274, 273], stating that every physical process can be simulated using only pure states and reversible interactions with an environment. The principle is directly inspired by quantum information, where the applications of purification are countless. In addition to purification, Ref. [274] contains five principles that are influence the outcome probabilities of present experiments. Combining these five, seemingly innocuous requirements with the purification principle, Ref. [274] singled out quantum theory uniquely. This result has been recently presented in a non-technical terms in Ref. [287]. In the following we will introduce the reader to the principles of Ref. [274], providing their translation in the ordinary mathematical language of quantum theory. The principles will be first stated in the non-technical terms of Ref. [287], and then illustrated in the special example of quantum theory.

12.1.1 Operational vs Hilbert space framework

Most works on the reconstruction of quantum theory adopt an operational framework describing the experiments that a physicist can perform in a laboratory. Each device has an input system and an output system, a set of possible outcomes $X$ that the experimenter can read out, and a set of random processes $\{\mathcal{C}_{i}\}_{i\in X}$ occurring in conjunction with the outcomes. As a special case, a device can have trivial input (i.e. no input at all), in which case its action consists in preparing a system in a particular set of states. Likewise, the device can have trivial output, in which case its action consists in a demolition measurement that absorbs the system, producing an outcome with some probability (think for example of the absorption of a photon on a photograhic plate). Quantum theory can be cast in the operational framework as a special example. In the mathematical language of quantum theory, systems are described by Hilbert spaces, and the outcomes of a device correspond to outcomes of an (indirect) measurement. A device that prepares states of a system with Hilbert space $\mathcal{H}$ corresponds to an ensemble $\{\rho_{i},p_{i}\}_{i\in X}$ , where each $\rho_{i}$ is a density matrix (non-negative operator with unit trace) and $\{p_{i}\}_{i=1}^{X}$ are probabilities. Here $p_{i}$ is the probability that the device outputs the system in the state $\rho_{i}$ . A device that performs a demolition measurement will be described by a positive operator valued measure (POVM), namely a collection $\{P_{i}\}_{i\in X}$ of non-negative operators satisfying the condition $\sum_{i\in X}P_{i}=I$ , where $I$ is the identity operator on the Hilbert space of the system. When a POVM measurement $\{P_{i}\}_{i\in X}$ is performed on a system prepared in the quantum state $\rho$ , one obtains the outcome $i$ with probability $p_{i}=\operatorname{Tr}[P_{i}\rho]$ . The textbook example of POVM measurement is the measurement on an orthonormal basis $\{|i\rangle\}_{i=1}^{d}$ ( $d$ being the dimension of the Hilbert space), corresponding to the POVM $\{P_{i}\}_{i=1}^{d}$ with $P_{i}=|i\rangle\langle i|$ .

General quantum processes (with both non-trivial input and non-trivial output) are described by the theory of open quantum systems [288, 289]. When a system $S$ , initially prepared in state $\rho$ , interacts with an environment $E$ , initially prepared in state $\sigma$ , through a joint unitary evolution $U_{SE}$ , the joint state changes to $U_{SE}(\rho\otimes\sigma)U^{\dagger}_{SE}$ . If we perform a measurement on the environment, with POVM $\{P_{i}\}_{i\in X}$ , the outcome $i$ will occur with probability $p_{i}=\operatorname{Tr}[U_{SE}(\rho\otimes\sigma)U^{\dagger}_{SE}(I\otimes P_{i})]$ and, for $p_{i}\not=0$ , the state of the system conditionally to outcome $i$ will be given by $\rho_{i}^{\prime}=\mathcal{C}_{i}(\rho)/\operatorname{Tr}[\mathcal{C}_{i}(\rho)]$ . Here, $\mathcal{C}_{i}$ is the linear map

[TABLE]

$\operatorname{Tr}_{\mathcal{H}_{E}}$ denoting the partial trace over the environment. Any such map is completely positive (it transforms positive operators into positive operators even if we apply it only on one part of a composite system) and trace-decreasing ( $\operatorname{Tr}[\mathcal{C}_{i}(\rho)]\leq\operatorname{Tr}[\rho]$ for every density matrix $\rho$ ). Completely positive trace-decreasing maps are known as quantum operations [290]. A collection of quantum operations $\{\mathcal{C}_{i}\}_{i\in X}$ as in Eq. (12.1) is known as quantum instrument [291]. A quantum instrument describes the action of a device that couples the system with an environment and performs a measurement on the latter. A quantum operation can be always written in the Kraus form [290]

[TABLE]

where $\{C_{i,k}\}_{k=1}^{K}$ is a set of operators on the system’s Hilbert space. Here we can think of each operator $C_{i,k}$ as representing a “quantum jump” that randomly changes the state of the system to $\rho^{\prime}_{i,k}=C_{i,k}\rho C_{i,k}^{\dagger}/\operatorname{Tr}[C_{i,k}\rho C_{i,k}^{\dagger}]$ .

When there is a single operator in Eq. (12.2), we know exactly which quantum jump has occurred for outcome $i$ . In this case, we say that the process $\mathcal{C}_{i}$ is fine-grained, because knowing the outcome $i$ gives full information about the jump undergone by the system. On the contrary, a quantum operation $\mathcal{C}_{i}$ with more than one Kraus operator represents a coarse-grained process: in principle, one can devise a finer measurement on the environment with outcomes $(i,k)$ , inducing the quantum operations $\mathcal{C}_{i,k}(\rho)=C_{i,k}\rho C_{i,k}^{\dagger}$ . For example, if the environment is a particle with spin, measuring only the position of the particle would result in a coarse-grained process, because the information coming from the spin degree of freedom has been ignored.

Coarse-graining over all possible outcomes $i$ , the evolution of the system’s state under the random-process $\{\mathcal{C}_{i}\}_{i\in X}$ is given by $\rho\mapsto\rho^{\prime}=\sum_{i\in X}\mathcal{C}_{i}(\rho)$ . The linear map $\mathcal{C}:\rho\mapsto\mathcal{C}(\rho)=\sum_{i\in X}\mathcal{C}_{i}(\rho)$ is called quantum channel. Due to the normalization of the probabilites, the map is trace-preserving, namely

[TABLE]

as it can be easily checked by taking the sum in Eq. (12.1) and using the normalization $\sum_{i\in X}P_{i}=I$ .

12.1.2 Informational principles and their translation in the Hilbert space language

At the general operational level, devices can be connected with one another, if the output system of one device coincides with the input system of the next. Note that the notion of input and output of a device singles out in a privileged direction when we compose systems. The first principle in Ref. [274, 287] states that information can only flow from input to output.

Causality: * The probability of an outcome at a certain step does not depend on the choice of experiments performed at later steps.*

Let us illustrate the meaning of the principle in the Hilbert space framework. Suppose that a quantum system, initially prepared in the state $\rho$ , is sent first to the quantum instrument $\{\mathcal{C}_{i}\}_{i\in X}$ and then to another quantum instrument $\{\mathcal{D}_{j}\}_{j\in Y}$ (for example, the two instruments could be to successive Stern-Gerlach devices). The joint probability of observing the sequence of outcomes $(i,j)$ will be $p(i,j)=\operatorname{Tr}[\mathcal{D}_{j}\mathcal{C}_{i}(\rho)]$ . In the general operational framework, causality states that the probability of observing outcome $i$ , given by $p(i)=\sum_{j\in Y}p(i,j)$ should not depend on the choice of the particular quantum instrument $\{\mathcal{D}_{j}\}_{j\in Y}$ (in the example, it should not depend on the orientation of the magnetic field in the second Stern-Gerlach device). This condition is indeed satisfied, thanks to the normalization condition of Eq. (12.3), which gives $p_{i}=\sum_{j\in Y}\operatorname{Tr}[\mathcal{D}_{j}\mathcal{C}_{i}(\rho)]=\operatorname{Tr}[\mathcal{C}_{i}(\rho)]$ , independently of the choice of $\{\mathcal{D}_{j}\}_{j\in Y}$ .

In we place our operational theory in a relativistic spacetime, than we have that causality forbids information to travel faster than the speed of light, in the sense that the outcome probabilities for an experiment performed at a spacetime point $P=(x,y,z,t)$ can not depend on the settings of experiments performed at spacetime points that do not contain $P$ in their light cone. As a consequence, causality implies the no signalling principle: when two experiments take place in space-like separated regions, the outcome probabilities for an experiment should not depend on the settings of the other experiment.

The second principle in Ref. [287] pertains the extraction of information from composite systems.

Local tomography: The state of a composite system is determined by the statistics of local measurements on its components.

Let us illustrate the principle in the quantum case: suppose that two density matrices $\rho,\sigma$ on the Hilbert space $\mathcal{H}_{A}\otimes\mathcal{H}_{B}$ give the same statistics for every pair of local POVM measurements $\{P_{i}\}_{i\in X}$ and $\{Q_{j}\}_{j\in Y}$ on $\mathcal{H}_{A}$ and $\mathcal{H}_{B}$ , namely $\operatorname{Tr}[(P_{i}\otimes Q_{j})\rho]=\operatorname{Tr}[(P_{i}\otimes Q_{j})\sigma]$ for all $i,j$ . Choosing $P_{i}$ and $Q_{j}$ to be arbitrary rank-one projectors $P_{i}=|\alpha_{i}\rangle\langle\alpha_{i}|$ and $Q_{j}=|\beta_{j}\rangle\langle\beta_{j}|$ and using the polarization identity, we then conclude that $\rho$ must be equal to $\sigma$ . This means that the probability distributions for local measurements are sufficient to identify the density matrix of the composite system $\mathcal{H}_{A}\otimes\mathcal{H}_{B}$ . Note that the property of local tomography, which holds both for classical and quantum theory, fails to hold for the variant of quantum theory on real Hilbert spaces [292].

The following principle states an information-theoretic property of the composition of physical processes.

Fine-Grained Composition: The sequence of two fine-grained processes is a fine-grained process.

As we already noted, a fined-grained process in quantum theory is represented by a quantum operation $\mathcal{C}_{i}$ of the form $\mathcal{C}_{i}(\rho)=C_{i}\rho C_{i}^{\dagger}$ , so that the information about the outcome $i$ is enough to identify the quantum jump $\rho\mapsto\rho_{i}^{\prime}=C_{i}\rho C_{i}^{\dagger}/\operatorname{Tr}[C_{i}\rho C_{i}^{\dagger}]$ . Suppose now that we have two devices in a sequence, and that the two devices produce two outcomes $i$ and $j$ corresponding to two fine-grained processes $\mathcal{C}_{i}(\rho)=C_{i}\rho C_{i}^{\dagger}$ and $\mathcal{D}_{j}(\rho)=D_{j}\rho D_{j}^{\dagger}$ , respectively. The composite process will be given by the quantum operation $\mathcal{D}_{j}\mathcal{C}_{i}(\rho)=(D_{j}C_{i})\rho(D_{j}C_{i})^{\dagger}$ , which still represents a fine-grained process.

The next principle guarantees that we can encode classical bits using the physical systems available in our operational theory:

Perfect Distinguishability: If a state is not compatible with some preparation, then it is perfectly distinguishable from some other state.

What does it mean that a state is not compatible with some preparation? Consider a mixed quantum state, with density matrix $\rho=\sum_{i}p_{i}|\psi_{i}\rangle\langle\psi_{i}|$ . The state $\rho$ can be interpreted as the average state of the ensemble $\{|\psi_{i}\rangle\langle\psi_{i}|,p_{i}\}$ , where the system is prepared in the pure state $|\psi_{i}\rangle$ with probability $p_{i}$ . According to this interpretation, the state $\rho$ is compatible with the system being prepared in every pure state $|\psi_{i}\rangle$ . Saying that a density matrix $\rho$ is not compatible with some preparation means that there exists some pure state $|\psi\rangle$ that cannot appear in any ensemble decomposition of $\rho$ . In formula, this means that the only solution to the equation

[TABLE]

with $\sigma$ a density matrix, is $p=0$ and $\sigma=\rho$ . Technically, if a density matrix $\rho$ is not compatible with some pure state, then $\rho$ cannot be invertible. This means that $\rho$ will have a non-trivial kernel $\mathsf{Ker}(\rho):=\{|\varphi\rangle\in\mathcal{H}~{}|~{}\rho|\varphi\rangle=0\}$ . Hence, every unit vector $|\varphi\rangle\in\mathsf{Ker}(\rho)$ will be orthogonal to $\rho$ , and, therefore, it will represent a pure state of the system that is perfectly distinguishable from $\rho$ . For example, the state $\rho=1/2(|0\rangle\langle 0|+|1\rangle\langle 1|)$ of a three-level system is not compatible with the system being prepared in the pure state $|\psi\rangle=1/\sqrt{2}(|1\rangle+|2\rangle)$ . The perfect distinguishability principle then imposes that there must exist a state $\rho^{\prime}$ that is perfectly distinguishable from $\rho$ : in this particular example, $\rho^{\prime}=|2\rangle\langle 2|$ . Perfect distinguishability enables us to encode classical bits into quantum states. For instance, we can encode a classical bit into the angular momentum of a nucleous, by encoding the logical 0 in the the spin up state $|\uparrow\rangle$ and the logical 1 in the spin down state $|\downarrow\rangle$ .

The next principle ensures the possibility of encoding the state of a system $A$ in the state of another system $B$ of potentially smaller “size”.

Ideal Compression: Information can be compressed in a lossless and maximally efficient fashion.

Let us make mathematically precise the meaning of this principle through an example in quantum theory. Suppose that a three-level system $A$ is prepared in the mixed state $\rho=1/2(|0\rangle\langle 0|+|1\rangle\langle 1|)$ . This preparation is compatible with the system being in every pure state $|\psi\rangle=\alpha|0\rangle+\beta|1\rangle$ , with $|\alpha|^{2}+|\beta|^{2}=1$ . It is clear that we can encode these states in a two-level system using the encoding channel $\mathcal{C}(\rho)=C\rho C^{\dagger}+\langle 2|\rho|2\rangle~{}|\!\uparrow\rangle\langle\uparrow\!|$ with $C|0\rangle=|\!\uparrow\rangle$ and $C|1\rangle=|\!\downarrow\rangle$ . With this encoding, the state $|\psi^{\prime}\rangle=\alpha|\!\uparrow\rangle+\beta|\!\downarrow\rangle\in\mathcal{H}_{B}$ is the “codeword” for the state $|\psi\rangle=\alpha|0\rangle+\beta|1\rangle\in\mathcal{H}_{A}$ . The encoding is lossless for the information compatible with $\rho$ , because we can always restore the initial state $|\psi\rangle$ from $|\psi^{\prime}\rangle$ . The encoding is also maximally efficient, because we cannot encode without losses the pure states $\{|\psi\rangle=\alpha|0\rangle+\beta|1\rangle,|\alpha|^{2}+|\beta|^{2}=1\}$ in a system of Hilbert space dimension smaller than 2. In general, the encoding of a mixed state $\rho$ into another physical system $B$ is maximally efficient if every pure state of $B$ is the codeword for some pure state compatible with $\rho$ . For a density matrix of rank $r$ , the ideal compression is obtained by encoding the information in a Hilbert space of dimension $r$ .

All the principles discussed so far are satisfied both by classical and quantum theory. The principle that singles out uniquely quantum theory is the following

Purification Principle: Every random process can be simulated in an essentially unique way as a reversible interaction of the system with a pure environment.

Let us illustrate the meaning of the principle in the Hilbert space framework, where random processes are described by quantum channels. One way to simulate a random process $\mathcal{C}(\rho)=\sum_{i\in X}C_{i}\rho C_{i}^{\dagger}$ is to introduce an environment with Hilbert space $\mathcal{H}_{E}=\mathsf{Span}\{|i\rangle\}_{i\in X}$ . For example, every pure state $|\eta\rangle\in\mathcal{H}_{E}$ and every unitary operator $U_{SE}$ satisfying $U_{SE}|\psi\rangle|\eta\rangle=\sum_{i\in X}C_{i}|\psi\rangle|i\rangle$ will give

[TABLE]

This means that the random process $\mathcal{C}$ can be simulated by letting the system interact unitarily with an environment prepared in the pure state $|\eta\rangle$ . Since unitary interactions are reversible, this is an example of the pure and reversible simulation required by the purification principle. The pure and reversible simulation is not unique, because we are free to choose $|\eta\rangle$ to be any pure state of the environment and to choose the basis $\{|i\rangle\}_{i\in X}$ used in the definition of $U_{SE}$ to be any orthonormal basis for $\mathcal{H}_{E}$ . However, the pure and reversible simulation is essentially unique: once we fix the environment there is no remaining freedom except for the choice of bases for Hilbert space $\mathcal{H}_{E}$ .

The ability to purify every random process is a unique feature of quantum theory. The main message of Refs.[274, 287] is that, among all physical theories satisfying the first five reasonable requirements, quantum theory is the only one that enables a pure and reversible simulation of every random process. Such a key feature places quantum theory at the core of the theory of reversible computation. From this angle, the usual picture is turned upside down: instead of regarding quantum theory as “incomplete” [28] because it fails to give deterministic predictions about the outcomes of arbitrary measurements, we are brought to regard classical theory as “incomplete” because it fails to provide a pure and reversible simulation of arbitrary random processes. This type of simulation is essential if we want to reconcile information theory and physics, the former being based on the notions of random variable and noisy channel, and the latter trying to model phenomena in terms of pure states and fundamentally reversible interactions.

12.2 Measurement sharpness trims nonlocality and contextualize in every physical theory

Nonlocality [28, 19] and contextualize [293, 294] are among the most striking features of quantum mechanics, in radical conflict with the worldview of classical physics. Still, quantum mechanics is neither the most nonlocal theory one can imagine, nor the most contextual. For nonlocality, this observation dates back to the seminal work of Popescu and Rohrlich [295], who showed that relativistic no-signalling is compatible with correlations that are much stronger than those allowed by quantum theory. Their work stimulated the question whether other fundamental principles, yet to be discovered, characterize the peculiar set of correlations observed in the quantum world. Up to now, several candidates that partly retrieve the set of quantum correlations have been proposed, including Non-Trivial Communication Complexity [296, 297], No-Advantage in Nonlocal Computation [298], Information Causality [299], Macroscopic Locality [300], and, most recently, Local Orthogonality (LO) [301]. The observation that quantum theory is not maximally contextual is more recent [302, 303] and so is the search for principles that characterize the quantum set of contextual probability distributions. On this front, the only principle put forward so far is Consistent Exclusivity (CE) [304, 305, 5].

Despite many successes, a complete characterization of the quantum set is still challenging. What makes the problem hard is the fact that—intendedly—the principles considered so far dealt only with input-output probability distributions, without making any hypothesis on how these distributions are generated. On the other hand, a physical theory does not provide only probability distributions, but also specifies rules on how to combine physical systems together, how to measure them, and how to evolve their state in time [306]. Considering that fundamental quantum features like no-cloning and the possibility of universal computation cannot be expressed just in terms of input-output distributions, it is natural to wonder whether also quantum nonlocality and contextualize could better understood in a broader framework of general probabilistic theories (GPTs) [280, 307, 273, 308]. Further motivation to extend the framework comes from the latest principles in the nonlocality and contextualize camps, LO and CE. Both principles refer to a notion of orthogonal events and impose that the sum of the probabilities of a set of mutually orthogonal events shall no not exceed one. This is a powerful requirement, which in the case of LO is even capable to rule out non-quantum correlations that are compatible with every bipartite principle [309]. But why should Nature obey such a requirement? And what does this requirement tell us about the fundamental laws that govern physical processes?

Here we tackle the problem of understanding quantum nonlocality and contextualize from a new angle, which focuses on the fundamental structure of measurements. In an arbitrary physical theory, we introduce a class of ideal measurements, called sharp, that are repeatable and cause the minimal amount of disturbance on future observations. We postulate that all measurements are sharp at the fundamental level and we explain the apparent unsharpness of real life experiments as due to the interaction with the environment. Assuming that sharp measurements remain sharp under elementary operations, such as joining two outcomes together and applying two measurements in parallel, we show that the fundamental sharpness of measurements implies the validity of CE and LO, thus providing a strong constraint on the set of probability distributions. Our result demonstrates that principles formulated in the broader framework of GPTs can offer an extra power in the characterization of the quantum set and identifies the fundamental sharpness of measurements as a candidate principle for future axiomatizations of quantum theory.

12.2.1 Framework

In a general theory, a measurement is described by a collection of events, each event labelled by an outcome. We first consider demolition measurements, which adsorb the measured system. In this case, the measurement events are called effects and the measurement is a collection of effects $\{m_{x}\}_{x\in\set{X}}$ . For a system prepared in the state $\rho$ , the probability of the outcome $x$ is denoted by $p_{x}=(m_{x}|\rho)$ . In quantum theory this is a notation for the Born rule $p_{x}=\operatorname{Tr}[m_{x}\rho]$ , where $\rho$ is a density matrix and $m_{x}$ is a measurement operator. In general theories, however, $(m_{x}|\rho)$ does not denote a trace of matrices and in fact the actual recipe for computing the probability $(m_{x}|\rho)$ is irrelevant here. We will often use the notation $(m_{x}|$ and $|\rho)$ for effects and states, respectively. It is understood that two different states give different probabilities for at least one effect, and two different effects take place with different probabilities on at least one state.

When two measurements $\{m_{x}\}$ and $\{n_{y}\}$ are performed in parallel on two systems $A$ and $B$ , we denote by $m_{x}\otimes n_{y}$ the measurement event labelled by the pair of outcomes $x,y$ . Similarly, when two states of systems $A$ and $B$ , say $\alpha$ and $\beta$ , are prepared independently, we denote by $\alpha\otimes\beta$ the corresponding state of the composite system $AB$ . In quantum theory, this is the ordinary tensor product of operators, but this may not be the case in a general theory and, again, the actual recipe for computing $\alpha\otimes\beta$ is irrelevant here. What is relevant, instead, is that the notation is consistent with the operational notion of performing independent operations on different systems: If two systems are independently prepared in states $\alpha$ and $\beta$ and undergo to independent measurements $\{m_{x}\}$ and $\{n_{y}\}$ , we impose that the probability has the product form $p_{xy}=(m_{x}|\alpha)\,(n_{y}|\beta)$ .

The most basic operation one can perform on a measurement is to join some outcomes together, thus obtaining a new, less informative measurement. This operation, known as coarse-graining, is achieved by dividing the outcomes of the original measurement $\{m_{x}\}_{x\in\set{X}}$ into disjoint groups $\{\set{X}_{z}\}_{z\in\set{Z}}$ , and by identifying outcomes that belong to the same group. The result of this procedure is a new measurement $\{m^{\prime}_{z}\}_{z\in\set{Z}}$ satisfying the relation $(m^{\prime}_{z}|\rho)=\sum_{x\in\set{X}_{z}}(m_{x}|\rho)$ for every every $z$ and for every possible state $\rho$ . For brevity, we write $m^{\prime}_{z}=\sum_{x\in\set{X}_{z}}m_{x}$ .

Coarse-graining allows one to express the principle of causality, which states that the settings of future measurements do not influence the outcome probabilities of present experiments [273]. Causality is equivalent to the requirement that for every system $A$ there exists an effect $u_{A}$ , called the unit, such that

[TABLE]

for every measurement $\{m_{x}\}_{x\in\set{X}}$ on $A$ . In quantum theory, $u_{A}$ is the identity operator on the Hilbert space of the system and Eq. (12.6) expresses the fact that quantum measurements are resolutions of the identity. When there is no ambiguity, we drop the subscript from $u_{A}$ .

Causality has major consequences. First of all, it implies that the probability distributions generated by local measurements satisfy the no-signalling principle [273]. Moreover, it allows to perform adaptive operations: for example, if $\{m_{x}\}_{x\in\set{X}}$ is a measurement on system $A$ and $\{n^{(x)}_{y}\}_{y\in\set{Y}}$ is a measurement on system $B$ for every value of $x$ , then causality guarantees that it is possible to choose the measurement on $B$ depending on the outcome on system $A$ , i. e. that $\{m_{x}\otimes n^{(x)}_{y}\}_{x\in\set{X},y\in\set{Y}}$ is a legitimate measurement. Finally, causality allows one to describe non-demolition measurements. For a non-demolition measurement $\{\mathcal{M}_{x}\}_{x\in\set{X}}$ , the measurement events are transformations, which turn the initial state of the system, say $\rho$ , into a new unnormalized state $\mathcal{M}_{x}|\rho)$ . For a system prepared in the state $\rho$ , the probability of the outcome $x$ is $p_{x}=(u|\mathcal{M}_{x}|\rho)$ and, conditionally on outcome $x$ , the post-measurement state is $\mathcal{M}_{x}|\rho)/(u|\mathcal{M}_{x}|\rho)$ . We will often refer to the non-demolition measurements as instruments, in analogy with the usage in quantum theory [310, 311]. Note that, thanks to causality, every instrument $\{\mathcal{M}_{x}\}$ is associated to a unique demolition measurement $\{m_{x}\}$ via the relation

[TABLE]

By definition, $\{m_{x}\}$ describes the statistics of the instrument: for every state $\rho$ and for every outcome $x$ , one has $p_{x}=(u|\mathcal{M}_{x}|\rho)\equiv(m_{x}|\rho)$ .

Sharp measurements in arbitrary theories

In textbook quantum mechanics, physical quantities are associated to self-adjoint operators, called observables CITA. The values of a quantity are the eigenvalues of the corresponding operator and the probability that a measurement outputs the value $x$ is given by the Born rule $p_{x}=\operatorname{Tr}[P_{x}\rho]$ , where $P_{x}$ is the projector on the eigenspace for the eigenvalue $x$ and $\rho$ is the density matrix of the system before the measurement. If the measurement gives the outcome $x$ , then the state after the measurement is $\rho_{x}^{\prime}=P_{x}\rho P_{x}/\operatorname{Tr}[P_{x}\rho]$ , according to the projection postulate. These canonical measurements, where all the measurement operators are orthogonal projectors, are called sharp [312]. While it is clear that sharp measurements play a key role in quantum theory it is by far less clear how to define them in an arbitrary GPT. Here we propose a simple definition based on the notions of repeatability and minimal disturbance.

Let us start from repeatability. An instrument $\{\mathcal{M}_{x}\}$ is repeatable if it gives the same outcome when performed two consecutive times, namely

[TABLE]

where $\{m_{x}\}$ is the measurement of Eq. (12.7). Repeatability poses a fairly weak requirement on $\{m_{x}\}$ : every measurement that discriminates perfectly among a set of states $\{\rho_{x}\}$ can be realized by a repeatable instrument, which consists in measuring $\{m_{x}\}$ and, if the outcome is $x$ , re-preparing the system in state $\rho_{x}$ .

The second ingredient entering in our definition of sharp measurements is minimal disturbance. We say that the instrument $\{\mathcal{M}_{x}\}_{x\in\set{X}}$ does not disturb the measurement $\mathbf{n}=\{n_{y}\}_{y\in\set{Y}}$ if the former does not affect the statistics of the latter, namely

[TABLE]

where $(n_{y}|\mathcal{M}:=\sum_{x\in\set{X}}(n_{y}|\mathcal{M}_{x}$ . Then, we ask which instruments disturb the smallest possible set of measurements. Clearly, if $\{\mathcal{M}_{x}\}$ does not disturb $\mathbf{n}$ , then $\mathbf{n}$ must be compatible with the measurement $\mathbf{m}=\{(u|\mathcal{M}_{x}\}$ , in the sense that $\mathbf{m}$ and $\mathbf{n}$ can be measured jointly. Indeed, by measuring $\mathbf{n}$ after $\{\mathcal{M}_{x}\}$ one obtains the probability distribution $p_{xy}=(n_{y}|\mathcal{M}_{x}|\rho)$ , whose marginals on $x$ and $y$ are equal to the probability distributions of $\mathbf{m}$ and $\mathbf{n}$ , respectively. Read in the contrapositive, this means that if $\mathbf{m}$ and $\mathbf{n}$ are incompatible, the instrument $\{\mathcal{M}_{x}\}$ must disturb $\mathbf{n}$ . This leads us to the following definition: an instrument $\{\mathcal{M}_{x}\}$ has minimal disturbance if it disturbs only the measurements that are incompatible with $\mathbf{m}=\{(u|\mathcal{M}_{x}\}$ .

We define an instrument to be sharp if it is both repeatable and with minimal disturbance. We say that a measurement is sharp if it describes the statistics of a sharp instrument and we call an effect sharp if it belongs to a sharp measurement. In quantum theory, our definition coincides with the usual one: one can prove that the only sharp instruments are the Lüders instruments [313], of the form $\mathcal{M}_{x}(\rho)=P_{x}\rho P_{x}$ where $\{P_{x}\}$ is a collection of orthogonal projectors. Hence, the sharp measurements are projective measurements. In addition, we can prove that when a sharp measurement extracts a coarse-grained information, the experimenter can still retrieve the finer details at a later time. In fact, this is a necessary and sufficient condition for a measurement to be sharp, as proven in the Methods section.

Fundamental sharpness of measurements

Sharp measurements are an ideal standard—they are the measurements that generate outcomes in a repeatable way, while at the same time causing the least disturbance on future observations. Unfortunately though, most measurements in real life appear to be noisy and not repeatable. Hence the natural question: Is noise is fundamental? Or rather it is contingent to the fact that the experimenter has incomplete control on the conditions of the experiment? Here we state that noise is not fundamental and only arises from the fact that the realistic measurements do not extract information only from the system, but also from the surrounding environment:

Axiom 1 (Fundamental Sharpness of Measurements).

*Every measurement arises from a sharp measurement performed jointly on the system and on the environment. *

Precisely, we require that for every measurement $\mathbf{m}=\{m_{x}\}_{x\in\set{X}}$ there exists an environment $E$ , a state $\sigma$ of $E$ , and a sharp measurement $\mathbf{M}=\{M_{x}\}_{x\in\set{X}}$ on the composite system $SE$ such that, for every state $\rho$ of system $S$ , one has $(m_{x}|\rho)=\left(M_{x}\right|\rho\otimes\sigma)$ for every outcome $x\in\set{X}$ . In quantum theory, this is the content of the celebrated Naimark’s theorem [314, 315]. This is a deep property, hinting at the idea there exists a fundamental level where all measurements are ideal.

Let us push the idea further. If measurements are sharp at the fundamental level, it is natural to assume that the set of sharp measurements is closed under the basic operation of coarse-graining, which transforms an initial measurement $\mathbf{m}$ into a new, less informative measurement $\mathbf{m}^{\prime}$ . Indeed, since $\mathbf{m}^{\prime}$ provides less information than $\mathbf{m}$ , one expects that $\mathbf{m}^{\prime}$ should not be less repeatable, nor create more disturbance, than $\mathbf{m}$ . This intuition leads to the following requirement:

Axiom 2 (Less Information, More Sharpness).

If a measurement is less informative than a sharp measurement, then it is sharp.

Suppose now that two experimenters, Alice and Bob, perform two sharp measurements on two systems $A$ and $B$ in their laboratories. Again, if measurements are sharp at the fundamental level, one expects the result of Alice’s and Bob’s measurements to be a sharp measurement on the composite system $AB$ . If this were not the case, it would mean that at the fundamental level some measurements require nonlocal interactions, even though at the operational level the they appear to be implemented locally by Alice and Bob. We then postulate the following

Axiom 3 (Locality of Sharp Measurements).

If two sharp measurements are applied in parallel on systems $A$ and $B$ , then the result is a sharp measurement on the composite system $AB$ .

Axioms 1-3 lay down the fundamental structure of sharp measurements, summarized in Fig. 12.1.

They are satisfied by classical theory and by quantum theory, both on complex and real Hilbert spaces. In the following we will show that the fundamental structure of sharp measurements has an enormous impact on the amount of nonlocality and contextualize that can be found in a physical theory.

12.2.2 Derivation of CE

At present, CE is the only principle known to constrain the amount of contextualize of a generic theory. Operationally, the principle can be formulated as follows: Consider a collection of sharp measurements $\{\mathbf{m}^{(x)},\,x\in\set{X}\}$ , each measurement having outcomes in a set $\set{Y}_{x}$ . Suppose that the possible events have been labelled so that two effects corresponding to the same outcome coincide, i. e. $m_{y}^{(x)}\equiv m_{y}$ , independently of $x$ . Letting $\set{Y}=\cup_{x}\set{Y}_{x}$ be the set of all outcomes, one calls two distinct outcomes $y,y^{\prime}\in\set{Y}$ exclusive if there exists a measurement setting $x$ such that both $y$ and $y^{\prime}$ belong to $\set{Y}_{x}$ . We say that a theory satisfies CE if for every set of mutually exclusive outcomes $\set{E}$ and for every state $\rho$ the probabilities $p_{y}=(m_{y}|\rho)$ obey the bound $\sum_{y\in\set{E}}p_{y}\leq 1\,.$

Our first key result is the derivation of CE. In fact, we prove a stronger result: We define two sharp effects $m$ and $m^{\prime}$ to be orthogonal if they belong to the same measurement and we prove that mutually orthogonal effects can be combined into a single sharp measurement (see Methods). Clearly, since mutually exclusive outcomes correspond to mutually orthogonal effects, the existence of a joint measurement containing the effects $\{m_{y}\}_{y\in\set{E}}$ implies the bound $\sum_{y\in\set{E}}(m_{y}|\rho)\leq 1$ . Our result implies that in a theory where measurements are fundamentally sharp the violation of Kochen-Specker inequalities is upper bounded by the value set by CE [303]. What is remarkable here is that a single requirement on measurements influences directly the strength of contextualize in an arbitrary physical theory. This situation contrasts with that of the known axiomatizations of quantum theory [280, 274, 283, 282, 316, 284], where the quantum bounds on contextualize are retrieved only indirectly through the derivation of the Hilbert space framework.

Our axioms do not imply only CE, but also the whole hierarchy of extensions of this principle defined in Ref. [5]. The $L$ -th level of the hierarchy can be defined by considering independent measurements on $L$ copies of the state $\rho$ . Denoting by $\mathbf{y}=(y_{1},\dots,y_{L})$ the string of all outcomes, one says that two strings $\mathbf{y}$ and $\mathbf{y}^{\prime}$ are exclusive if there exists some $i$ such that $y_{i}$ and $y_{i}^{\prime}$ are exclusive. A physical theory satisfies the $L$ -th level of the hierarchy if the probabilities $p_{L}(\mathbf{y})=\prod_{i=1}^{L}(m_{y_{i}}|\rho)$ obey the bound $\sum_{\mathbf{y}\in\set{E}}p_{L}(\mathbf{y})\leq 1$ for every set $\set{E}$ of mutually exclusive strings. In the Methods section we show that our axioms on sharp measurements imply that this bound is satisfied for every possible $L$ .

12.2.3 Derivation of LO

In the nonlocality camp, LO occupies a special position, being up to now the only known principle that rules out non-quantum correlations that are not detected by any bipartite principle [309]. LO refers to a scenario where $N$ parties perform local measurements on $N$ systems, initially prepared in some joint state. The $i$ -th party can choose among different measurement settings in a set $\set{X}_{i}$ and her measurements give outcomes in another set $\set{Y}_{i}$ . Let $\mathbf{x}=(x_{1},\dots,x_{N})$ be the string of all settings, $\mathbf{y}=(y_{1},\dots,y_{N})$ be the string of all outcomes, and $\mathbf{e}$ be the pair $\mathbf{e}=(\mathbf{x},\mathbf{y})$ . In this context, the pair $\mathbf{e}=(\mathbf{x},\mathbf{y})$ is called an event and two events are called locally orthogonal iff there exists a party $i$ such that $x_{i}=x_{i}^{\prime}$ and $y_{i}\not=y_{i}^{\prime}$ . Setting $p(\mathbf{e})$ to be the conditional probability distribution $p(\mathbf{y}|\mathbf{x})$ , one says that theory satisfies local orthogonality if all the probability distributions generated by local measurements obey the bound $\sum_{\mathbf{e}\in\set{O}}p(\mathbf{e})\leq 1$ for every set $\set{O}$ of pairwise locally orthogonal events.

To derive LO, we specify how the probability distribution $p(\mathbf{y}|\mathbf{x})$ is generated: In the most general scenario, the $N$ parties share a state $\rho$ and that, for setting $x_{i}$ , party $i$ performs a measurement $\mathbf{m}^{(i,x_{i})}$ . Denoting the product effects $P^{(\mathbf{x})}_{\mathbf{y}}:=\bigotimes_{i=1}^{N}m^{(i,x_{i})}_{y_{i}}$ , the probability distribution of the outcomes is given by $p(\mathbf{e}):=\left(P^{(\mathbf{x})}_{\mathbf{y}}|\rho\right)$ . The proof that LO follows from the axioms, provided in Methods, consists of three steps: First, thanks to the Fundamental Sharpness of Measurements, the problem is reduced to proving that LO holds for probability distributions generated in a scenario where all parties perform sharp measurements. Then, we observe that, in the case of sharp measurements, locally orthogonal events correspond to orthogonal effects. Finally, we use the fact that mutually orthogonal effects can coexist in a single measurement. As a corollary, we obtain the bound $\sum_{\mathbf{e}\in\set{O}}p(\mathbf{e})\leq 1$ , establishing the validity of LO for all the probability distributions generated by measurements in our theory.

Like in the case of CE, our axioms imply the whole hierarchy of extensions of LO introduced in [5]. The hierarchy is defined as follows: the probabilities $p(\mathbf{y}|\mathbf{x})$ satisfy the $L$ -th level of the hierarchy if their product $p(\mathbf{y}_{1}|\mathbf{x}_{1})\cdots p(\mathbf{y}_{L}|\mathbf{x}_{L})$ satisfies LO. Now, we can think of the product as being generated by measurements on $N$ copies of the state $\rho$ . In this way, we reduce the problem of proving the $L$ -th level of the hierarchy to the problem of proving LO for measurements performed on the state $\rho^{\otimes L}$ . But we already proved the validity of LO for arbitrary measurements and arbitrary states. In conclusion, the structure of sharp measurements implies that LO is satisfied at every possible level. A striking consequence of this argument is that the fundamental sharpness of measurements rules out PR box correlations, as the latter violate the LO hierarchy [301]. In other words, in the world of PR boxes some measurements must be fundamentally noisy. Finally, since our axioms imply LO, in particular they imply all the limitations that LO sets on Bell inequalities and nonlocal games. We will elaborate on this point in the following.

12.2.4 Sharp Bell inequalities

The request that measurements are ideal at the fundamental level exerts a censorship on the amount of nonlocality that can be detected by experiments. To illustrate this fact, we show a number of Bell inequalities where the sharpness of measurements prevents every violation. We call such inequalities sharp.

Consider a game played by $N$ non-communicating parties and a referee, who sends to party $i$ an input $x_{i}$ and receives back an output $y_{i}$ . The referee chooses the input string $\mathbf{x}$ at random with probability $q(\mathbf{x})$ and assigns a payoff $\omega(\mathbf{x},\mathbf{y})$ to the players, assumed without loss of generality to be nonnegative for every $(\mathbf{x},\mathbf{y})$ . The expected payoff obtained by the players is given by $\omega=\sum_{\mathbf{x},\mathbf{y}}q(\mathbf{x})\omega(\mathbf{x},\mathbf{y})p(\mathbf{y}|\mathbf{x})$ , where $p(\mathbf{y}|\mathbf{x})$ is the probability distribution describing their strategy. For a given game, the maximum payoff that can be achieved by classical strategies—call it $\omega_{c}$ —defines a Bell inequality, $\omega\leq\omega_{c}$ . The game can be associated with a graph $\set{G}$ , here called the winning graph, by choosing as vertices the events $(\mathbf{x},\mathbf{y})$ such that $q(\mathbf{x})\omega(\mathbf{x},\mathbf{y})\not=0$ and placing an edge between two events $\mathbf{e}$ and $\mathbf{e}^{\prime}$ if they are not locally orthogonal, as illustrated in Figure 12.2.

In this picture, the maximum payoff achieved by classical strategies is

[TABLE]

where $\set{C}$ is a clique, i. e. a subset of $\set{G}$ with the property that every two vertices in $C$ are connected [301].

A first class of games leading to sharp Bell inequalities is the class of games with a graph $\set{G}$ that is the disjoint union of mutually disconnected cliques $\set{C}_{k},\,k\in\{1,\dots,K\}$ . This class contains the game Guess Your Neighbor’s Input [4] and the maximally difficult Distributed Guessing Problems of Ref. [301]. In addition, it contains other games such as, for even $N$ , the “Guess the Parity” game where each player is asked to guess the parity of the input string $\mathbf{x}$ . For all these games, LO implies that the classical payoff is an upper bound. Indeed, picking for every $k$ the event $\mathbf{e}_{k}\in\set{C}_{k}$ that has maximum probability, the payoff can be bounded as

[TABLE]

and since the events $\{\mathbf{e}_{k}\}_{k=1}^{K}$ are locally orthogonal by construction, one has $\sum_{k}p(\mathbf{e}_{k})\leq 1$ and, therefore, $\omega\leq\max_{k}q\left(\set{C}_{k}\right)\equiv\omega_{c}$ . In conclusion, every sharp game defines a Bell inequality, $\omega\leq\omega_{c}$ that cannot be violated by any theory satisfying LO, and, in particular, by any theory where measurements are fundamentally repeatable and minimally disturbing. Using a result of Ref. [5], the proof that LO cuts the payoff down to its classical value can be extended to a larger class of games, defined by the property that the winning graph $\set{G}$ is a perfect graph [317]. For example, one such game is the “Guess the Product” game where the players are win if they guess the product of their inputs. Finally, there are examples of games where the Bell inequality $\omega\leq\omega_{c}$ is sharp even if the winning graph is not perfect, such as Guess the Parity when the number of players is odd (see Supplementary Note 1 for the proof that the payoff is upper bounded by the classical value).

Our results are derived in a minimal framework, which avoids some assumptions commonly made in GPTs. In particular, our arguments do not invoke local tomography [280, 318, 307], but only the requirement that sharp measurements are local. This requirement is strictly weaker: for example, it is satisfied by quantum theory on real Hilbert spaces, where local tomography fails. Also, the validity of our results does not require that the states of a given physical system form a convex set. Thanks to this feature, the results apply also to non-convex theories, like Spekkens’ toy theory [319]. Interestingly, probabilities themselves do not play a crucial role in our arguments and it is quite straightforward to extend the results to theories that only specify which outcomes are possible, impossible or certain, without specifying their probabilities, such as Schumacher’s and Westmoreland’s theory [320].

Since sharp measurements play a central role in quantum mechanics, it is not surprising that they have been the object of extensive investigation since the early days [321, 313, 322]. Later, Holevo proposed a purely statistical definition of sharp measurement, which does not refer to post-measurement states [323]. Although in the quantum case Holevo’s definition reduces to that of projective measurement, in general theories it is inequivalent to ours, and it is not clear how one could use it to derive features like LO and CE. Different notions of ideal measurements were put forward by Piron [324, 325] and Beltrametti-Cassinelli [326] in the framework of quantum logic. In general, they differ form our definition in the way the condition of minimal disturbance is defined. Most recently, measurement disturbance came back to play an important role in the search for basic principles principles, as shown e. g. by the No Disturbance Without Information principle of Ref. [327].

Our work joined the insights from two different approaches to the foundations of quantum mechanics: the characterization of quantum correlations [295, 296, 297, 298, 299, 300, 301] and the study of general probabilistic theories [280, 274, 283, 282, 316, 284]. Although these two approaches have developed on separate tracks so far, they share the same fundamental goal: understanding which picture of Nature lies behind the mathematical laws of quantum mechanics and guiding our intuition towards the formulation of new protocols and new physical theories. Our results demonstrate that the interaction between the two approaches can be beneficial for both. Here LO and CE stimulated the search for new principles in the GPT framework, leading to a compelling picture of nature where measurements are repeatable and cause minimal disturbance at the fundamental level. The idea that a noisy physical process can be reduced to an ideal process at the fundamental level reminds immediately of another quantum feature: Purification [273]. Operationally, Purification is the property that every mixed state can be generated from a pure state of a composite system by discarding one component. This principle implies directly entanglement and is at the core of the reconstruction of quantum theory of Ref. [274]. Our result suggests the possibility that purification and sharpness could be sufficient to derive quantum theory. In terms of quantum correlations, this would lead to the tantalizingly simple picture “Purification brings nonlocality in, sharpness cuts it down”. Going even further, it is intriguing wonder whether purification and sharpness can be viewed as two sides of the same medal by imposing that physical theories must satisfy a suitable requirement of time symmetry, similarly to what was done in quantum theory by Aharonov, Bergmann and Lebowitz [328, 329].

12.2.5 Methods

Characterization of sharp instruments. The starting point of our results is the observation that an instrument $\{\mathcal{M}_{x}\}$ is sharp if and only if

[TABLE]

for every measurement $\mathbf{r}=\{r_{xy}\}_{(x,y)\in\set{X}\times\set{Y}}$ that refines $\{m_{x}\}$ , i. e. $\sum_{y\in\set{Y}}r_{xy}=m_{x}$ for every $x$ . Let us see why Eq. (12.11) is equivalent to sharpness. First, suppose that Eq. (12.11) holds. Clearly, this implies that $\mathbf{m}$ is repeatable, as one can see by summing over $y$ . Moreover, Eq. (12.11) implies that $\mathbf{m}$ is a minimal disturbance measurement. Indeed, take a generic measurement $\mathbf{n}$ that is compatible with $\mathbf{m}$ . By definition, this means that there exists a joint measurement $\mathbf{r}=\{r_{xy}\}$ such that $\sum_{x}r_{xy}=n_{y}$ for every $y$ and $\sum_{y}r_{xy}=m_{x}$ for every $x$ . We then obtain

[TABLE]

having used Eq. (12.11) in the second equality. Summing over $y$ and using the normalization of the measurement $\mathbf{n}$ we obtain the condition $\sum_{y}(s_{y}|=0$ , or, equivalently, $\sum_{y}(s_{y}|\rho)=0$ for every $\rho$ . Since probabilities are non-negative, this implies that each term in the sum vanishes, leading to the relation $s_{y}=0$ for every $y$ . Inserting this relation back in Eq. (12.12) we conclude that $(n_{y}|\mathcal{M}=(n_{y}|$ , that is, the instrument does not disturb $\mathbf{n}$ . Hence, Eq. (12.11) implies that $\{\mathcal{M}_{x}\}$ is a sharp instrument. Conversely, if $\{\mathcal{M}_{x}\}$ is a sharp instrument then Eq. (12.11) must be satisfied. By definition, one has

[TABLE]

and using the repeatability condition $(m_{x}|=(m_{x}|\mathcal{M}_{x}$ one obtains $\sum_{x^{\prime}\not=x}(m_{x^{\prime}}|\mathcal{M}_{x}=0$ . Again, the fact that probabilities are nonnegative implies that each term in the sum must vanish, namely $(m_{x^{\prime}}|\mathcal{M}_{x}=0$ for every $x^{\prime}\not=x$ . Now, let $\mathbf{r}$ be a measurement such that $\sum_{y}r_{xy}=m_{x}$ . Since the measurement $\mathbf{r}$ is compatible with $\mathbf{m}$ , Eq. (12.9) implies $(r_{xy}|\mathcal{M}=(r_{xy}|$ with $\mathcal{M}=\sum_{x^{\prime}}\mathcal{M}_{x^{\prime}}$ . On the other hand, for $x\not=x^{\prime}$ the condition $(m_{x}|\mathcal{M}_{x^{\prime}}=0$ implies $(r_{xy}|\mathcal{M}_{x^{\prime}}=0$ . Hence, we conclude that $(r_{xy}|\mathcal{M}_{x}=(r_{xy}|\mathcal{M}=(r_{xy}|$ for every $x$ and $y$ . ∎

Joint measurability of orthogonal effects. The characterization of sharp instruments, combined with the Less Information-More Sharpness principle, leads directly to the first key result of our work: a construction showing that mutually orthogonal effects can be measured jointly in a single sharp measurement. Precisely, if $m_{k}$ is orthogonal to $m_{l}$ for every $k,l\in\{1,\dots,K\}$ , we show that there exists a joint sharp measurement $\mathbf{j}$ such that $\{m_{k}\}_{k=1}^{K}\subseteq\mathbf{j}$ .

Let us see how to construct the joint measurement. Let $\mathbf{m}^{(k)}$ be the sharp measurement that contains the effect $m_{k}$ . By coarse graining of $\mathbf{m}^{(k)}$ one obtains the binary measurement $\mathbf{m}^{(k)}=\{m_{0}^{(k)},m^{(k)}_{1}\}$ , with $m_{0}^{(k)}:=m_{i}$ and $m_{1}^{(k)}:=u-m_{k}$ . By the Less Information, More Sharpness postulate, $\mathbf{m}^{(k)}$ is sharp. Let $\{\mathcal{M}^{(k)}_{0},\mathcal{M}_{1}^{(k)}\}$ be the corresponding instrument. Now, since $m_{k}$ and $m_{l}$ are orthogonal, $\mathbf{m}^{(kl)}=\{m_{k},m_{l},e-m_{k}-m_{l}\}$ must be a valid measurement. Since $\mathbf{m}^{(k)}$ is a coarse-graining of $\mathbf{m}^{(kl)}$ , Eq. (12.11) gives

[TABLE]

Now, consider the following measurement procedure: i) perform the first instrument ii) if the outcome is $1$ , then perform the second instrument, iii) for every $k<K$ , if the outcome of the $k$ -th instrument is $1$ , perform the $(k+1)$ -th instrument. The resulting instrument, denoted by $\{\mathcal{J}_{k}\}_{i=1}^{K+1}$ consists of the transformations

[TABLE]

The measurement $\mathbf{j}=\{j_{k}\}_{k=1}^{K+1}$ associated to the instrument $\{\mathcal{J}_{k}\}_{k=1}^{K+1}$ is the desired joint measurement: indeed, and using Eqs. (12.7) and (12.13) we obtain $(j_{k}|=(e|\mathcal{M}_{0}^{(k)}\mathcal{M}_{1}^{(k-1)}\cdots\mathcal{M}_{1}^{(1)}=(m_{k}|\mathcal{M}_{1}^{(k-1)}\cdots\mathcal{M}_{1}^{(1)}=(m_{k}|$ for every $k\in\{1,\dots,K\}$ . In addition, the measurement $\mathbf{j}$ is sharp. Indeed, if a measurement $\{r_{kl}\}$ is a refinement of $\mathbf{j}$ , i. e. $\sum_{l}r_{kl}=j_{k}$ for all $k$ , then $\mathbf{r}$ is also a refinement of the sharp measurement $\mathbf{m}^{(k^{\prime})}$ for every fixed $k^{\prime}$ . Hence, one has

[TABLE]

Using this fact and the definition of $\mathcal{J}_{k}$ it is immediate to obtain the relation $(r_{kl}|\mathcal{J}_{k}=(r_{kl}|$ for every $k,l$ . Thanks to Eq. (12.11), this proves that the instrument $\{\mathcal{J}_{k}\}$ is sharp, and so is the corresponding measurement $\mathbf{j}$ . ∎

The ability to combine orthogonal effects into a single measurement is a powerful asset. As we already observed, it implies CE at its basic level. In the following we show that it can be used also to obtain the whole CE hierarchy.

Orthogonality of product effects. In a causal theory the information available at a given moment of time can be used to make decisions about the settings of future experiments, thus allowing for adaptive measurements where the choice of setting for a system $B$ depends on the outcome of a measurement on system $A$ . In particular, if $\{m_{x}\}$ is a sharp measurement on $A$ and $\{n^{(x)}_{y}\}_{y\in\set{Y}}$ is a sharp measurement for every $x$ , then $\{m_{x}\otimes n^{(x)}_{y}\}$ is a legitimate measurement. Now, the Locality of Sharp Measurements implies that $\{m_{x}\otimes n^{(x_{0})}_{y}\}$ is sharp for every fixed $x_{0}$ . Since $x_{0}$ if arbitrary, this means that each effect $m_{x}\otimes n^{(x)}_{y}$ is sharp and that two effects $m_{x}\otimes n^{(x)}_{y}$ and $m_{x^{\prime}}\otimes n^{(x^{\prime})}_{y^{\prime}}$ are orthogonal unless $x=x^{\prime}$ and $y=y^{\prime}$ .

Thanks to this observation, it is easy to see that every level of the CE hierarchy is satisfied. The key is to note that if two strings of outcomes $\mathbf{y}$ and $\mathbf{y}^{\prime}$ are exclusive, then the corresponding effects $P_{\mathbf{y}}$ and $P_{\mathbf{y}^{\prime}}$ are orthogonal. This is clear because, by definition, the effects corresponding to two exclusive strings are of the form $P_{\mathbf{y}}=m_{y_{i}}\otimes n$ and $P_{\mathbf{y}^{\prime}}=m_{y_{i}^{\prime}}\otimes n^{\prime}$ where the effects $m_{y_{i}}$ and $m_{y_{i}^{\prime}}$ are orthogonal and the effects $n=\otimes_{j\not=i}m_{y_{j}}$ and $n^{\prime}=\otimes_{j\not=i}m_{y_{j}}$ are sharp thanks to the Locality of Sharp Measurements. Using our result about product effects, we then have that $P_{\mathbf{y}}$ and $P_{\mathbf{y}^{\prime}}$ are orthogonal. Now, a set of mutually exclusive strings $\set{E}$ corresponds to a set of mutually orthogonal effects $\{P_{\mathbf{y}}\}_{\mathbf{y}\in\set{E}}$ . Since mutually orthogonal effects can be combined into a joint measurement, the probabilities $p_{L}(\mathbf{y})=(P_{\mathbf{y}}|\rho^{\otimes L})$ obey the bound $\sum_{\mathbf{y}\in\set{E}}p_{L}(\mathbf{y})\leq 1$ , meaning that the theory satisfies the $L$ -th level of the CE hierarchy for arbitrary $L$ .

Note that the same argument can be used to prove the validity of LO for the probability distributions generated in a scenario where all parties perform sharp measurements. In this scenario, two locally orthogonal events $(\mathbf{x},\mathbf{y})$ and $(\mathbf{x}^{\prime},\mathbf{y}^{\prime})$ correspond to two orthogonal effects $P^{(\mathbf{x})}_{\mathbf{y}}$ and $P^{(\mathbf{x}^{\prime})}_{\mathbf{y}^{\prime}}$ , for exactly the same reason mentioned above. Hence, the joint measurability of orthogonal effect implies the bound $\sum_{(\mathbf{x},\mathbf{y})\in\set{O}}p(\mathbf{x}|\mathbf{y})\leq 1$ for every set $\set{O}$ of locally orthogonal events. In other words, all the probability distributions generated by sharp measurements obey LO.

Reduction to sharp measurements. While CE applies only to sharp measurements, LO applies to arbitrary measurements. This is because every probability distribution that we can encounter in our theory is a probability distribution generated by sharp measurements. This fact can be seen as follows: combining the Fundamental Sharpness with the Locality of Sharp Measurements, one can show that for every party $i$ and every measurement $\mathbf{m}^{(i,x_{i})}$ , there exists an ancilla $A_{i}$ , a state of $A_{i}$ , call it $\sigma_{i}$ , and a sharp measurement $\mathbf{M}^{(i,x_{i})}$ such that $\left(m^{(i,x_{i})}_{y_{i}}|\rho_{i}\right)=\left(M^{(i,x_{i})}_{y_{i}}|\rho_{i}\otimes\sigma_{i}\right)$ for every $x_{i}$ and for every $y_{i}$ (cf. Supplementary Note 2 for the proof). Now, since all measurements that party $i$ can perform can be replaced by sharp measurements by adding an ancilla in a fixed state $\sigma_{i}$ , the input-output distribution $p(\mathbf{y}|\mathbf{x})$ generated by arbitrary measurements on the state $\rho$ coincides with the input-output distribution generated by sharp measurements on the state $\rho^{\prime}=\rho\otimes\sigma_{1}\otimes\dots\otimes\sigma_{N}$ . In other words, at the level of correlations there is no difference between sharp and non-sharp measurements. Thanks to this fact, deriving LO for sharp measurements is equivalent to deriving LO for arbitrary measurements.

Appendix A Coherence Distillation Procedure

A coherence distillation procedure refers to a series of incoherent operations by which a large number of identical partly coherent states can be transformed into a smaller number of maximally coherent states. This chapter introduces a coherence distillation procedure for pure qubit states. With $N$ copies of states $\ket{\psi}=(\alpha\ket{0}+\beta\ket{1})$ , we show that we can asymptotically obtain $l$ copies of $\ket{\Psi_{2}}=(\ket{0}+\ket{1})/\sqrt{2}$ , where $l$ and $N$ satisfy $l/N\approx R_{I}(\ket{\psi})$ . The derivation method can be generalized to an arbitrary dimension.

A.1 Coherence distillation: qubit

First we prepare $MN$ copies of a partially coherent qubit state which will be uniformly divided into $M$ groups. The initial state of each group can be expressed according to

[TABLE]

A binomial expansion on the computational basis contains $N+1$ distinct coefficients $\beta^{N},\alpha^{1}\beta^{N-1},\dots,\alpha^{N}$ . Thus we can divide the original $2^{N}$ -dimensional Hilbert space into $N+1$ subspaces according to the coefficients. For the $k$ th coefficient $\alpha^{N-k}\beta^{k}$ , the corresponding $k$ th subspace is a $D_{k}=C_{N}^{k}$ dimensional Hilbert space, whose basis are denoted by

[TABLE]

When considering the computational basis, $\ket{e_{i}^{k}}$ $(i=,1,2,\cdots,D_{k})$ is an N-qubit basis with $(N-k)$ $\ket{0}$ s and $k$ $\ket{1}$ s.

Next, we perform a projection measurement on $\ket{\psi}^{\otimes N}$ to the subspaces. In our case, the projection operator that maps onto the $k$ th subspace is given by

[TABLE]

The probability of obtaining the $k$ th outcome is

[TABLE]

Note that as the coefficients for the expansion are the same, the post-selection of the $k$ th outcome corresponds to a maximally coherent state $\ket{\Psi_{D_{k}}}$ of dimension $D_{k}$ .

If $D_{k}=2^{r}$ , we can directly convert to $r$ copies of $\ket{\Psi_{2}}$ as desired. Or, we can repeat this process $M$ times, and take the tensor product of the post selected state to obtain a maximally coherent state of dimension $D$ ,

[TABLE]

where $k_{j}$ is the outcome of the $j$ th measurement, and the total dimension is $D=D_{k_{1}}D_{k_{2}}\cdots D_{k_{M}}$ . The total dimension $D$ will lie between $2^{r}$ and $2^{r}(1+\epsilon)$ $(0<\epsilon<1)$ for some power $r$ . It can be proved [15] that as $M$ increases, $\epsilon$ will asymptotically approach [math].

Therefore, we can perform a second projection measurement to the $2^{r}$ -dimensional Hilbert subspace and directly get obtain a final state

[TABLE]

Using the above procedure, $NM$ copies of a partly coherent qubit state $\alpha\ket{0}+\beta\ket{1}$ have been distilled into $r$ copies of maximally coherent state $\ket{\Psi_{2}}=\ket{0}+\ket{1}$ .

In the following, we will show that all the operations of the distillation protocol are incoherent operations. In addition, we will show that the number of distilled maximally coherent state $r$ and the number of initial qubit $MN$ satisfy the relation $NMR_{I}(\ket{\psi})\approx r$ .

A.1.1 Incoherent operations

As the only operations are the two projective measurements, we only need to prove the following lemma.

Lemma 1.

Suppose an $n$ -dimensional Hilbert space has a complete basis $I_{n}=\{\ket{1},\ket{2},\cdots,\ket{n}\}$ . A projection measurement that divides $I_{n}$ into its complementary subsets are incoherent operations on the basis of $I_{n}$ .

Proof.

Suppose that the basis $I_{n}$ is divided into $m$ complementary subsets $I_{n_{1}},I_{n_{2}},\cdots,I_{n_{m}}$ , such that $I_{n_{\alpha}}\cap I_{n_{\beta}}=\emptyset$ , for all $\alpha\neq\beta\in\{1,2,\cdots,m\}$ , and $I_{n}=I_{n_{1}}\cup I_{n_{2}}\cup\cdots\cup I_{n_{m}}$ . Denote the projector that projects onto the $I_{n_{\alpha}}$ subspace by $P_{\alpha}$ . Thus, we can show that the projection measurement is a set of Kraus operators $\{\hat{P}_{\alpha}\}$ that satisfy $\hat{P}_{\alpha}^{\dagger}\hat{P}_{\beta}=\delta_{\alpha,\beta}\hat{P}_{\alpha}$ and $\sum_{\alpha}P_{\alpha}=I_{n}$ . To prove the projection measurement to be an incoherent operation, we additionally need to show that $\hat{P}_{\alpha}\mathcal{I}_{n}\hat{P}_{\alpha}^{\dagger}\subset\mathcal{I}_{n}$ , where $\mathcal{I}_{n}$ is the set of all incoherent states that can be represented by $\delta=\sum_{i=1}^{n}\delta_{i}\ket{i}\bra{i}$ . As the definition of $P_{\alpha}$ , we have

[TABLE]

where $\delta(\ket{i}\in I_{n_{\alpha}})=1$ if $\ket{i}\in I_{n_{\alpha}}$ and $\delta(\ket{i}\in I_{n_{\alpha}})=0$ otherwise. Thus, we can show that for an arbitrary state $\delta=\sum_{i=1}^{d}\delta_{i}\ket{i}\bra{i}\in\mathcal{I}_{n}$ , we have

[TABLE]

∎

Therefore, we have proven that the operations in the distillation protocol are incoherent.

A.1.2 Coherence loss

To explain why we have $NMR_{I}(\ket{\psi})\approx r$ , we only need to consider the coherence loss during the distillation process. The initial state in each group can be rewrite as

[TABLE]

where $\ket{\Psi_{D_{k}}}$ is a maximally coherent state of dimension $D_{k}$ . Thus the density matrix of the initial state is

[TABLE]

As the coherence of $\rho$ is defined by its von Neumann entropy of its diagonal terms, we first look at $\rho^{diag}$ . That is,

[TABLE]

Here, we can see that when $k\neq k^{\prime}$ , $\bra{e_{i}}{\Psi_{k}}\rangle\bra{\Psi_{k^{\prime}}}{e_{i}}\rangle=0$ . Therefore Eq. (A.11) can be simplified as

[TABLE]

Here, $\rho^{\mathrm{diag}}$ has the decomposition $\{p_{k},\rho_{k}^{diag}\}$ . Thus, we have

[TABLE]

where $S(\rho^{diag})$ is the von Neumann entropy of $\rho^{diag}$ and $H(p_{k})$ is the Shannon entropy. Considering our coherence (intrinsic randomness) definition, Eq. (A.13) is equivalent to

[TABLE]

where $C(\rho)$ is the average initial coherence and $\sum_{k=0}^{N}p_{k}C(\rho_{k})$ is the average coherence left after the first projection measurement. Therefore, the coherence loss in the first operation is

[TABLE]

The coherence loss for the second projection measurement can be easily estimated by $\log_{2}(1+\epsilon)\approx\epsilon$ . Thus the total coherence loss has an upper bound given by

[TABLE]

which is negligible relative to the initial coherence $MNC(\ket{\psi})$ when $M$ and $N$ are large.

A.2 General definition

Generally, when considering the intrinsic randomness of multiple copies of $\rho$ , we can define the average intrinsic randomness in a manner similar to the definition of entanglement cost [174, 43, 17] by

[TABLE]

where $D(\rho_{1},\rho_{2})$ is a suitable measure of distance, which, for instance, could be the trace norm. In this case, the intrinsic randomness is understood as the average coherence cost in preparing $\rho$ . Compared to the definition of the regulated entanglement of formation [174], we conjecture that $R_{I}^{C}(\rho)$ equals the regulated intrinsic randomness measure,

[TABLE]

In the other direction, we can apply intrinsic operations to transform $N$ non-maximally coherent copies of $\rho$ to $l$ maximally coherent state $\ket{\Psi_{2}}$ . Similarly, we can define the distillable coherence by the supremum of $l/N$ over all possible distillation protocols [177, 43, 17],

[TABLE]

This distillable coherence $R_{I}^{D}(\rho)$ can thus be considered as the amount of intrinsic randomness when a quantum extractor is performed before measurement, as shown in the main context. For a general reasonable regularized coherence measure $C_{I}^{\infty}(\rho)$ similar to Eq. (A.18), we conjecture that the two measures $R_{I}^{D}$ and $R_{I}^{C}$ are equivalent for all possible distance measures. They serves as two extremal measures, such that, $R_{I}^{D}\leq C_{I}^{\infty}\leq R_{I}^{C}$ for all regularized $C_{I}^{\infty}$ . In addition, similar to entanglement measures [330, 331, 332], we show in the next section that the coherence measure for pure states is unique under regularization

A.3 A unique measure for pure states

In this section, we show that the measure of randomness is unique for pure quantum states.

First note that $R_{I}^{D}(\rho)\leq R_{I}^{C}(\rho)$ . Otherwise, we could first distill $N$ copies of $\rho$ into $NR_{I}^{D}(\rho)$ copies of $\ket{\Psi_{2}}$ , and then convert into $NR_{I}^{D}(\rho)/R_{I}^{C}>N$ copies of $\rho$ . For a pure state $\rho$ , we have already give a distillation protocol such that $R_{I}^{D}(\rho)=R_{I}^{C}(\rho)$ . Now, suppose that $C_{I}$ is a coherence measure for a single quantum state. By regularization, the coherence measure is given by

[TABLE]

Suppose that $C_{I}(\ket{\Psi_{d}})=\log_{2}d$ , we now prove that $C_{I}^{\infty}(\rho)=R_{I}^{D}(\rho)=R_{I}^{C}(\rho)$ for pure state $\rho$ .

Proof.

For a given pure state $\rho$ , suppose that $C_{I}^{\infty}(\rho)<R_{I}^{D}(\rho)$ , that is, $C_{I}^{\infty}(\rho)=R_{I}^{D}(\rho)-\Delta$ , where $\Delta$ is a finite positive number. After the distillation process, we can convert $N$ copies of $\rho$ into approximately $NR_{I}^{D}(\rho)$ copies of $\ket{\Psi_{2}}$ . From the main context, we know that the remaining coherence is $NR_{I}^{D}(\rho)$ . For the $C_{I}^{\infty}(\rho)$ measure, the coherence after distillation is also $NR_{I}^{D}(\rho)$ , while the initial coherence is given by $NC_{I}^{\infty}(\rho)=NR_{I}^{D}(\rho)-N\Delta$ . As $\Delta$ is a finite positive number, the distillation process increases coherence, which leads to a contradiction.

If $C_{I}^{\infty}(\rho)>R_{I}^{D}(\rho)$ , we can follow a similar method by considering the transformation $NR_{I}^{C}(\rho)$ copies of $\ket{\Psi_{2}}$ into $N$ copies of $\rho$ . The contradiction originates from checking the coherence increase with the $C_{I}^{\infty}(\rho)$ measure during the incoherent transformation process. ∎

Appendix B SI-QRNG

This chapter discusses finite size effect of the source independent quantum random number generator.

B.1 Calculation of the number of effective $X$ -basis measurements

In this appendix, we show that in the asymptotic limit, the number of effective $X$ -basis measurements is independent of $n$ . Our starting point is Eq. (10.5) and $\varepsilon_{\theta}<2^{-100}$ . Notice that normally $n$ is smaller than $10^{12}<2^{40}$ to ease fast post-processing; thus, the term $1/\sqrt{n}$ and the other polynomial terms in Eq. (10.5) play a relatively small role in making $\varepsilon_{\theta}<2^{-100}$ . In the following, we consider only the exponent in Eq. (10.5).

For ease of notation, let $x=e_{bx}$ , $y=e_{bx}+\theta$ and $q=q_{x}$ . Then the exponent of Eq. (10.5) becomes

[TABLE]

and the inequality $\varepsilon_{\theta}<2^{-100}$ is approximately equivalent to

[TABLE]

Since $q$ is very small, one can make three approximations:

[TABLE]

and

[TABLE]

Then, by applying Eqs. (B.2) and (B.3), the inequality (B.1) becomes

[TABLE]

Applying Eq. (B.4) yields

[TABLE]

and rearranging terms, we have

[TABLE]

Substituting the definitions of $x$ and $y$ , we obtain

[TABLE]

Finally, we substitute $q=n_{x}/n$ and get

[TABLE]

which is independent of $n$ .

B.2 Proof of the random sampling property for a type of QRNG input after loss

In this appendix, we first restate the setting. In the idealistic protocol, the measurement device chooses its measurement basis after confirming that the state received from the source is not a vacuum (or equivalently, not lost). In practice, confirming whether a state is a vacuum is usually done by observing whether detectors in the measurement device click or not. Thus, it is desirable for the measurement device to choose its basis before confirming whether loss happens.

We prove that for a specific input that defines the measurement basis choices before the potential loss, the positions of $n_{x}$ valid $X$ -basis measurements (after excluding loss events) are randomly drawn from the positions of the total of $n$ valid measurements. This proves that the random sampling technique from Fung et al. can still be applied when the measurement basis is chosen before the loss.

For ease of presentation, we state the input that specifies the measurement choices before the loss as follows. The input is a string of length $N=N_{x}+N_{z}$ that contains $N_{x}$ 0s and $N_{z}$ 1s. The $\binom{N}{N_{z}}$ possibilities for choosing the positions of $N_{z}$ 1s from the total $N_{x}+N_{z}$ positions are equally likely. Here, 0 stands for an $X$ -basis measurement and 1 stands for a $Z$ -basis measurement. After loss, the numbers of valid $X$ -basis measurements and $Z$ -basis measurements are denoted by $n_{x}$ and $n_{z}$ , respectively, with a total string length of

[TABLE]

We need to show that the output is uniform for the $\binom{n_{x}+n_{z}}{n_{z}}$ possibilities of choosing the positions of $n_{z}$ 1s from the total $n$ positions.

The proof proceeds through a symmetry argument. The input is symmetric, i.e., if we exchange the indices of two positions, the distribution will not change. Suppose that the initial positions are $1,2,\dots,n$ and the probability of choosing specific positions for $N_{z}$ 1s from the total $N$ positions is

[TABLE]

For ease of presentation, denote the left positions after loss as $i_{1}<i_{2}<\dots<i_{n}$ . Then each possibility with $n_{x}$ 0s in the left $n$ positions has the same probability

[TABLE]

which proves our claim.

As a side remark, we could see that the proof does not depend on whether the loss is basis dependent or independent. Thus, the same property also holds for a more general class of losses that could be useful in other settings. Another remark is that independent and identically distributed input also satisfies the property, as in the work of Fung et al.

B.3 Random seed dilution

The input is either given directly or expanded from a uniformly random seed. Here, we provide a method for performing the expansion. The expansion is straightforward since the input is also uniformly random within its support. We can simply map a uniform seed of length $\log\binom{N}{c_{1}}$ bijectively to the input support, which is the $\binom{N}{c_{1}}$ possibilities of choosing the positions of $c_{1}$ 0s from the string of of length $N$ . Then, we have obtained the desired input. Furthermore, note that this construction is deterministic; thus, input randomness is only needed for the uniformly random seed of length $n$ .

For the input of our protocol, the ratio of the initial random seed length to the number of runs $N$ becomes negligible as $N$ goes to infinity because the number of $X$ -basis measurements $c_{1}$ is a constant, as derived in Appendix B.1. More precisely, the min-entropy of the input as well as the length of the uniformly random seed has an upper bound given by

[TABLE]

Note that since the detector completely controls this random seed length, calculating the exact input min-entropy is possible. This is very different from estimating the error rate in the finite-key analysis section, in which we can only estimate the range of the error rate with a high probability of success. Apart from the input specified in the main text, independent and identically distributed bit strings are also a possible choice for the input. Finally, we remark that the reason to include this input seed length analysis is to make our QRNG composable.

Appendix C Proof for randomness requirement for the CH inequality

This chapter is the proof of the randomness requirement for the CH inequality.

C.1 Proof for finite strategies of choosing input settings

As we discussed in Sec. 9.2, there are two levels of strategies. One is the strategy of choosing the input settings and the other is about the outputs conditioned on inputs of Alice and Bob. As there are finite deterministic strategies of Alice and Bob, here, we prove that the strategies of choosing input settings is finite and can be characterized by all the possible optimal strategies of Alice and Bob.

Essentially, even the strategies of Alice and Bob are finite, the strategies of choosing input settings can always be infinite. Here, what want to prove is that any optimal strategy (including both levels) can be realized with finite strategies of choosing input settings.

Suppose there exist an optimal strategy that gives maximal CH value with LHVMs. For this strategy, we suppose there are finite strategies of choosing the input settings (the proof for infinite case follows similarly). Then, it is easy to check that for a given $\lambda$ and hence $(p_{0}(\lambda),p_{1}(\lambda),p_{2}(\lambda),p_{3}(\lambda))$ in the optimal strategy, the optimal strategy for the output of Alice and Bob should be from the set Eq. (9.14). This also proves why we only take account of the possibly optimal deterministic strategies of Alice and Bob.

Now, suppose that there exist $m$ strategies of $\lambda$ of choosing input settings for the first strategy of Alice and Bob, $(p_{2}-p_{0})/2$ , that is,

[TABLE]

Here the superscript denotes the $m$ strategies of $\lambda$ and the subscript denotes the strategy for Alice and Bob. It is easy to see that we can always take an average of all the $m$ strategies without decreasing the Bell value and violate the constraints. In this case, we can define one $\lambda_{1}$ to the denote all the $\lambda_{1}^{1}$ , $\lambda_{1}^{2}$ , $\dots\lambda_{1}^{m}$ . That is,

[TABLE]

where

[TABLE]

Thus, we show that the $m$ strategies for choosing input settings can be combined into one for any strategy of Alice and Bob. In the following, we prove this argument in more detail.

Proof.

We use label $t$ to denote the $t$ th strategy of choosing input settings for a given strategy of Alice and Bob, $j$ to denote the strategies of Alice and Bob, and $i$ to denote the number of inputs meaning the subscript of $(p_{0}(\lambda),p_{1}(\lambda),p_{2}(\lambda),p_{3}(\lambda))$ .

We denote $\lambda_{j}^{t}$ to be the $t$ th strategy of choosing input settings when the optimal strategy for Alice and Bob is $j$ . The prior probability for $\lambda$ and input settings of each strategy are denoted as $q(\lambda_{j}^{t})$ and $p_{i}(\lambda_{j}^{t})$ , where $j\in\{1,2,3,4,5\}$ and $i\in\{0,1,2,3\}$ , respectively. Denote the Bell value for the $j$ th strategy to be $J_{j}$ , which is linear function of $\{p_{i}(\lambda_{j}^{t})\}$ . Thus, the total Bell value is given by

[TABLE]

and the constraints of $q(\lambda_{j}^{t})$ and $p_{i}(\lambda_{j}^{t})$ are given by,

[TABLE]

Just as mentioned above, we can add up $t$ by defining $q(\lambda_{j})$ and $p_{i}(\lambda_{j})$ by

[TABLE]

Take Eq. C.6 into the Eq. C.5, consequently we find the constraints of $q(\lambda_{j})$ and $p_{i}(\lambda_{j})$ are given by

[TABLE]

We should also note that the substitution in Eq. (C.6) will not affect the Bell value,

[TABLE]

where the last equality is because $J_{j}$ is a linear function.

∎

C.2 Optimal strategy of the CH test

C.2.1 General condition

In this section, we present the optimal strategy in order to maximizing $J_{\mathrm{CH}}^{\mathrm{LHVM}}$ defined in Eq. (9.17) under constraints defined in Eq. (9.16).

$Q=0$

For simplicity, we first consider the randomness requirement $P$ and set $Q$ to be 0. That is, the input randomness is upper bounded by $P$ ,

[TABLE]

The Bell value $J_{\mathrm{CH}}^{\mathrm{LHVM}}$ with LHVMs is given by

[TABLE]

Hereafter, we denote $J_{\mathrm{CH}}^{\mathrm{LHVM}}$ by $J$ for simple notation. Group $J$ by the index of the strategies of Alice and Bob $p_{i}$ , $i\in\{0,1,2,3\}$ , instead of $\lambda_{i}$ , then we have

[TABLE]

where

[TABLE]

And the constraints are given by

[TABLE]

In the following, we investigate the optimal strategy based on value of $P$ .

(1) when $\frac{1}{4}\leq P\leq\frac{1}{3}$ .

With the normalization condition of $p_{i}(\lambda_{j})$ , we can rewrite $J$ as

[TABLE]

In this case, we can write $J$ by

[TABLE]

where the coefficient is given in Table C.1.

Note that $p_{i}$ is upper bounded by $P$ , then we have

[TABLE]

Therefore, we have

[TABLE]

In addition, we can see that the equality holds by simply letting $p_{i}(\lambda_{j})$ to be $P$ for $\beta_{i,j}\neq 0$ and $p_{i}(\lambda_{j})$ to be $1-3P$ for $\beta_{i,j}=0$ . This special strategy is valid when $P\leq 1/3$ , we have to consider differently for the other cases.

(2) When $\frac{1}{3}\leq P\leq\frac{3}{8}$ .

With the constraints defined in Eq. (C.14), we can also write $J$ as follows,

[TABLE]

Then $J$ can be similarly expressed by

[TABLE]

with coefficient defined in Table .

The intuition to maximize Eq. (C.19) is to assign smaller values to $p_{i}(\lambda_{j})$ for smaller corresponding coefficients. Because $\frac{1}{3}\leq P\leq\frac{3}{8}$ , we can see that

[TABLE]

Therefore, the Bell value defined in Eq. (C.19) can be upper bounded by

[TABLE]

This equal sign can be achieved by following parameter:

[TABLE]

(3) When $P\geq\frac{3}{8}$ .

For this case, we can easily see that maximal Bell value can be achieved to be $1$ , which is the algebra maximum of $J$ . We show in the following that the Bell value cannot exceed $1$ .

From Eq. (C.19), we know that $J$ can be expressed by

[TABLE]

where $N$ denotes the part contribute negatively,

[TABLE]

Therefore, we show that $J\leq 1$ . The equal sign is satisfied with the following strategy

[TABLE]

$Q\neq 0$

In this part, we consider the input randomness quantification of $p_{i}(\lambda_{j})$ with both $P$ and $Q$ , which are defined in Eq. (9.2). In this case, we have

[TABLE]

Note that, if we we substitute $p_{i}(\lambda_{j})$ by

[TABLE]

we can show that the constraints on $p^{\prime}_{i}(\lambda_{j})$ are given by

[TABLE]

Compared to Eq.C.10, if we replace $p_{i}(\lambda_{j})$ by $p^{\prime}_{i}(\lambda_{j})$ , we obtain a new Bell value $J^{\prime}$ ,

[TABLE]

Because $p_{i}(\lambda_{j})$ and $p^{\prime}_{i}(\lambda_{j})$ are related by Eq. (C.27), we can prove that

[TABLE]

Therefore, instead of considering both upper and lower bound of $p_{i}(\lambda_{j})$ in the original Bell’s inequality, we can equivalently consider the same Bell inequality with $p^{\prime}_{i}(\lambda_{j})$ , which has upper bound $P$ and lower bound [math]. We have our result as follows,

(1) When $\frac{P-Q}{1-4Q}\leq\frac{1}{3}$ , that is $3P+Q\leq 1$

[TABLE]

(2) When $\frac{1}{3}\leq\frac{P-Q}{1-4Q}\leq\frac{3}{8}$ , that is $3P+Q\geq 1$ and $2P+Q\leq\frac{3}{4}$

[TABLE]

(3) When $\frac{P-Q}{1-4Q}\geq\frac{3}{8}$ , that is $2P+Q\geq\frac{3}{4}$

[TABLE]

Therefore, the optimal CH value $J^{\mathrm{LHVM}}_{\mathrm{CH}}$ with LHVMs,

[TABLE]

C.2.2 Factorizable condition

Now, we consider the optimal strategy of the CH test with LHVMs under factorizable condition,

[TABLE]

As we denote $p(i,j)$ by $p_{2*i+j}$ , we have

[TABLE]

$Q=0$

Similarly, we consider first the case with $Q=0$ . In the following, we show that all the five possible strategies are upper bounded by $P-1/4$ .

(1) When $P\leq\frac{1}{2}$ .

The result is based on the order of $p_{1}$ , $p_{2}$ , $p_{3}$ , and $p_{4}$ .

(a) $p_{3}\geq p_{2}\geq p_{1}\geq p_{0}$ and $p_{3}\geq p_{1}\geq p_{2}\geq p_{0}$ .

This case is equivalent to $p_{A}(1)\geq p_{A}(0)$ and $p_{B}(1)\geq p_{B}(0)$ . Thus we have $p_{A}(1)p_{B}(1)\leq P$ . Amongst the five strategies, the biggest one is $(p_{2}-p_{0})/2$ , which can be upper bounded by

[TABLE]

(b) $p_{1}\geq p_{0}\geq p_{3}\geq p_{2}$ and $p_{3}\geq p_{1}\geq p_{2}\geq p_{0}$ .

This case is equivalent to $p_{A}(0)\geq p_{A}(1)$ and $p_{B}(1)\geq p_{B}(0)$ . Thus we have $p_{A}(0)p_{B}(1)\leq P$ . Amongst the five strategies, the biggest one is $(p_{1}-p_{2})/2$ , which can be upper bounded by

[TABLE]

(c) $p_{2}\geq p_{3}\geq p_{0}\geq p_{1}$ and $p_{2}\geq p_{0}\geq p_{3}\geq p_{1}$ .

This case is equivalent to $p_{A}(1)\geq p_{A}(0)$ and $p_{B}(0)\geq p_{B}(1)$ . Thus we have $p_{A}(1)p_{B}(0)\leq P$ . Amongst the five strategies, the biggest one is $(p_{2}-p_{1})/2$ , which can be upper bounded by

[TABLE]

(d) $p_{0}\geq p_{1}\geq p_{2}\geq p_{3}$ and $p_{0}\geq p_{2}\geq p_{1}\geq p_{3}$ .

This case is equivalent to $p_{A}(0)\geq p_{A}(1)$ and $p_{B}(0)\geq p_{B}(1)$ . Thus we have $p_{A}(0)p_{B}(0)\leq P$ . Amongst the five strategies, the biggest one is $(p_{1}+p_{2})/2-p_{3}$ , which can be upper bounded by

[TABLE]

Therefore, we show that all the strategies are upper bounded by $P-1/4$ . Then the total Bell value

[TABLE]

and the equal sign holds.

(2) When $P\geq\frac{1}{2}$ .

It is easy to see that the maximal Bell value $J$ reaches 1 when $P\geq\frac{1}{2}$ .

Consequently, we show the optimal Bell value $J$ with LHVMs,

[TABLE]

$Q\neq 0$

We can follow a similar way in Appendix C.2.1 to take account of nonzero $Q$ .

(1) When $\frac{P-Q}{1-4Q}\leq\frac{1}{2}$ , that is $P+Q\leq\frac{1}{2}$

[TABLE]

(2) When $\frac{P-Q}{1-4Q}>\frac{1}{2}$ , that is $P+Q>\frac{1}{2}$

[TABLE]

Thus, the Bell value $J^{\mathrm{LHVM,Fac}}_{\mathrm{CH}}$ with LHVMs under factorizable condition is,

[TABLE]

C.3 Optimal strategy of the CHSH inequality

C.3.1 CH and CHSH inequalities under NS

In this section, we prove that the CH and CHSH inequality are equivalent when NS is assumed. We refer to [333] for detail discussion about the connection between CH and CHSH.

Proof.

According to the inputs, we can divide the CHSH inequality into four parts. When inputs are $ij$ , define :

[TABLE]

Owing to the NS condition, $J_{ij}$ can be rewritten by probabilities with output [math],

[TABLE]

Therefore, we have

[TABLE]

∎

Hence, under the NS assumption, the value of the CH and the CHSH inequality are linearly related. To analyze the best LHVMs strategy for the CH test, we can therefore consider the CHSH Bell test instead.

C.3.2 General condition

Follwing the similar method described above, we first consider deterministic strategies, i.e., $p_{A}(0|x),p_{B}(0|y)\in\{0,1\}$ for the reason that any probabilistic LHVM could be realized with convex combination of deterministic ones. Denote $p(i,j)$ as $p_{2*i+j}$ , it is easy to show that the possible optimal deterministic strategies for $J_{\lambda}$ are

[TABLE]

$Q=0$

Here, we also first consider that $Q=0$ .

(1) When $P\leq\frac{1}{3}$ .

We can show that, all the four strategies are upper bounded by $6P-1$ . Take the strategy of $p_{0}+p_{1}+p_{2}-p_{3}$ as an example,

[TABLE]

In this case, we can see that the CHSH value $J$ is upper bounded by $4(6P-1)$ .

(2) When $P>\frac{1}{3}$ .

In this case, LHVMs reaches the maximum Bell value, that is $J$ can be 4.

Thus, the Bell value $J$ LHVMs is

[TABLE]

$Q\neq 0$

For the case that $Q$ is nonzero, we apply the same transformation as Appendix C.2.1. After the transformation defined in Eq. (C.27), the relation between $J(p_{i}(\lambda_{j}))$ and $J^{\prime}(p^{\prime}_{i}(\lambda_{j}))$ is given by

[TABLE]

In this case, the optimal Bell value $J^{\mathrm{LHVM}}_{\mathrm{CHSH}}$ for the CHSH inequality with LHVMs is

[TABLE]

And the optimal CH value $J^{\mathrm{LHVM,NS}}_{\mathrm{CH}}$ with LHVMs under NS is

[TABLE]

C.3.3 Factorizable condition

In addition, we consider the factorizable condition,

[TABLE]

In this case, we have

[TABLE]

$Q=0$

(1) When $P\leq\frac{1}{2}$ .

For the case that $Q=0$ , $p_{i}$ are upper bounded by $P$ only. As the four strategies are symmetric, suppose that $p_{3}$ is the smallest one, which is equivalent to $p_{A}(0)\geq p_{A}(1)$ and $p_{B}(0)\geq p_{B}(1)$ . Thus we can see that $p_{0}+p_{1}+p_{2}-p_{3}$ is the largest strategy and is also upper bounded by

[TABLE]

Thus, we see that all the strategies are upper bounded by $2P$ . Then the Bell value is upper bounded by $8P$ .

(2) When $P>\frac{1}{2}$ .

We can easily see that LHVMs reaches the maximum Bell value, that is $J$ can be 4.

Consequently, the CHSH Bell value $J$ with LHVMs with factorizable condition is given by,

[TABLE]

$Q\neq 0$

For the case where $Q\neq 0$ , we can similarly derive our result. The CHSH Bell value $J^{\mathrm{LHVM,Fac}}_{\mathrm{CHSH}}(P,Q)$ is given by

[TABLE]

And the optimal CH value $J^{\mathrm{LHVM,NS,Fac}}_{\mathrm{CH}}$ with LHVMs under NS and factorizable condition is

[TABLE]

Bibliography333

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] John F. Clauser, Michael A. Horne, Abner Shimony, and Richard A. Holt. Proposed experiment to test local hidden-variable theories. Phys. Rev. Lett. , 23:880–884, Oct 1969.
2[2] Z. Leghtas, G. Kirchmair, B. Vlastakis, R. J. Schoelkopf, M. H. Devoret, and M. Mirrahimi. Hardware-efficient autonomous quantum error correction. Phys. Rev. Lett. , 111:120501, 2013.
3[3] Francesco Buscemi. All entangled quantum states are nonlocal. Phys. Rev. Lett. , 108:200401, May 2012.
4[4] Mafalda L. Almeida, Jean-Daniel Bancal, Nicolas Brunner, Antonio Acín, Nicolas Gisin, and Stefano Pironio. Guess your neighbor’s input: A multipartite nonlocal game with no quantum advantage. Phys. Rev. Lett. , 104:230404, Jun 2010.
5[5] Antonio Acín, Tobias Fritz, Anthony Leverrier, and Ana Belén Sainz. A combinatorial approach to nonlocality and contextuality. ar Xiv:1212.4084 , 2012.
6[6] C. H. Bennett and G. Brassard. Quantum Cryptography: Public Key Distribution and Coin Tossing. In Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing , pages 175–179, New York, 1984. IEEE Press.
7[7] Charles H. Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres, and William K. Wootters. Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels. Phys. Rev. Lett. , 70:1895–1899, Mar 1993.
8[8] Artur K. Ekert. Quantum cryptography based on bell’s theorem. Phys. Rev. Lett. , 67:661–663, Aug 1991.

$M_{A A}$	$M_{B B}$	$W$
$\| Φ_{A A}^{+} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ + \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$\| Φ_{B B}^{+} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ + \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$W = \sum_{s, t} β_{s, t}^{+ +} τ_{s}^{T} \otimes ω_{t}^{T}$
$\| Φ_{A A}^{-} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ - \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$\| Φ_{B B}^{-} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ - \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$W = \sum_{s, t} β_{s, t}^{- -} {\tilde{τ}}_{s}^{T} \otimes {\tilde{ω}}_{t}^{T}$
$\| Φ_{A A}^{+} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ + \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$\| Φ_{B B}^{-} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ - \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$W = \sum_{s, t} β_{s, t}^{+ -} τ_{s}^{T} \otimes {\tilde{ω}}_{t}^{T}$
$\| Φ_{A A}^{-} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ - \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$\| Φ_{B B}^{+} ⟩ = \frac{\| 0 ⟩ \otimes \| 0 ⟩ + \| 1 ⟩ \otimes \| 1 ⟩}{\sqrt{2}}$	$W = \sum_{s, t} β_{s, t}^{- +} {\tilde{τ}}_{s}^{T} \otimes ω_{t}^{T}$

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Interplay between Quantumness, Randomness, and Selftesting

Contents

List of Figures

List of Tables

Part I Introduction and Preliminaries

Chapter 1 Introduction

Quantumness and randomness

Quantumness and selftesting

Randomness and selftesting

Chapter 2 Basics of quantum mechanics

2.1 Quantum mechanics formalism—pure states and projective measurements

Pure states

Measurements

Observables

Evolution

2.2 Composite systems and subsystems

Composite system

Subsystems

Qubit systems

Purification and Schmidt decomposition

Positive observable valued measures

Entropy of quantum states

Chapter 3 Quantumness, selftesting, and randomness

3.1 Quantumness

3.1.1 Quantum coherence

3.1.2 Quantum entanglement

Entanglement framework

Entanglement witness

3.2 Selftesting: Bell nonlocality test

3.2.1 Clauser-Horne-Shimony-Holt inequality

3.2.2 Practical loopholes

3.3 Randomness generation and quantification

3.3.1 Randomness generation

Trusted-device QRNG I: single-photon detector

Qubit state

Temporal mode

Spatial mode

Multiple photon number states

Trusted-device QRNG II: macroscopic photodetector

Vacuum noise

Amplified spontaneous emission

Self-testing QRNG

Self-testing randomness expansion

Randomness amplification

Semi-self-testing QRNGs

Source-independent QRNG

Measurement-device-independent QRNGs

Other semi-self-testing QRNGs

Outlook

3.3.2 Randomness quantification

Min-entropy source

Santha-Vazirani weak sources [142]

Randomness extractor

Part II Quantumness and randomness

Chapter 4 Coherence and randomness

4.1 Quantifying quantum randomness

4.1.1 Quantum randomness against quantum information

4.1.2 Quantum randomness against classical information

Verifying the properties of RzCR_{z}^{C}RzC​.

Proof of (C1)

Proof of (C2)

Proof of (C3)

4.1.3 Qubit example

4.1.4 Comparison between the two randomness measures

4.2 Coherence or randomness distillation

4.2.1 Comparison with entanglement

4.3 Basis independent randomness and coherence

Chapter 5 Quantum Bernoulli Factory

5.1 Theoretical protocol

The protocol for quantum Bernoulli factory

5.2 Experimental realization

5.2.1 Experiment setup

5.2.2 Results

Verifying the properties of $R_{z}^{C}$ .

5.3.2 Simulation of the $q$ -coin

7.3.2 $\epsilon$ -level optimal EW