Sample caching Markov chain Monte Carlo approach to boson sampling simulation
Yong Liu, Min Xiong, Chunqing Wu, Dongyang Wang, Yingwen Liu,, Jiangfang Ding, Anqi Huang, Xiang Fu, Xiaogang Qiang, Ping Xu, Mingtang Deng,, Xuejun Yang, Junjie Wu

TL;DR
This paper introduces a novel sample caching Markov chain Monte Carlo method that reduces sample correlation and loss, enabling more efficient classical simulation of boson sampling, which is crucial for quantum supremacy validation.
Contribution
The paper proposes a new sample caching MCMC approach that eliminates sample autocorrelation and loss, improving classical boson sampling simulation efficiency and applicability to various sampling tasks.
Findings
Reduces sample autocorrelation in MCMC sampling.
Prevents sample loss during simulation.
Enhances efficiency of boson sampling simulation.
Abstract
Boson sampling is a promising candidate for quantum supremacy. It requires to sample from a complicated distribution, and is trusted to be intractable on classical computers. Among the various classical sampling methods, the Markov chain Monte Carlo method is an important approach to the simulation and validation of boson sampling. This method however suffers from the severe sample loss issue caused by the autocorrelation of the sample sequence. Addressing this, we propose the sample caching Markov chain Monte Carlo method that eliminates the correlations among the samples, and prevents the sample loss at the meantime, allowing more efficient simulation of boson sampling. Moreover, our method can be used as a general sampling framework that can benefit a wide range of sampling tasks, and is particularly suitable for applications where a large number of samples are taken.
| Scale | |||||||
|---|---|---|---|---|---|---|---|
| 9p.81m. | 0.9115 | 0.0102 | 0.0261 | 0.0119 | 0.0067 | 0.0035 | |
| 15p.225m. | 0.9490 | 0.0128 | 0.0451 | 0.0205 | 0.0102 | 0.0057 | |
| 21p.441m. | 0.9633 | 0.0030 | 0.0622 | 0.0311 | 0.0162 | 0.0081 | |
| 25p.625m. | 0.9673 | -0.0183 | 0.0642 | 0.0328 | 0.0167 | 0.0089 | |
| 30p.900m. | 0.9702 | -0.0832 | 0.0684 | 0.0309 | 0.0095 | -0.0162 |
| 3p.9m. | 4p.16m. | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| L | L | . | ||||||||||||
| 10 | 9.9912 | 99592 | 9.96% | 953263 | 95.33% | 0.2580 | 10 | 10.0059 | 100132 | 10.01% | 968989 | 96.90% | 0.3215 | |
| 20 | 19.9961 | 49948 | 4.99% | 774238 | 77.42% | 0.1514 | 20 | 19.9819 | 50304 | 5.03% | 816477 | 81.65% | 0.1949 | |
| 30 | 30.0184 | 33599 | 3.36% | 625093 | 62.51% | 0.1095 | 30 | 30.0365 | 33105 | 3.31% | 673629 | 67.36% | 0.1386 | |
| 40 | 39.9672 | 25031 | 2.50% | 519809 | 51.98% | 0.0843 | 40 | 40.0001 | 24880 | 2.49% | 566109 | 56.61% | 0.1070 | |
| 50 | 49.9564 | 19909 | 1.99% | 443412 | 44.34% | 0.0691 | 50 | 49.9722 | 19771 | 1.98% | 486719 | 48.67% | 0.0854 | |
| … | … | |||||||||||||
| 100 | 99.9055 | 9913 | 0.99% | 253312 | 25.33% | 0.0371 | 100 | 100.0127 | 10081 | 1.01% | 282310 | 28.23% | 0.0445 | |
| 200 | 199.8007 | 5018 | 0.50% | 135236 | 13.52% | 0.0189 | 200 | 199.7608 | 4989 | 0.50% | 152125 | 15.21% | 0.0234 | |
| … | … | |||||||||||||
| 500 | 499.5682 | 1992 | 0.20% | 56410 | 5.64% | 0.0062 | 500 | 499.5963 | 1971 | 0.20% | 64398 | 6.44% | 0.0096 | |
| … | … | |||||||||||||
| 1000 | 999.6380 | 994 | 0.10% | 28471 | 2.85% | 0.0031 | 1000 | 999.9884 | 967 | 0.10% | 32528 | 3.25% | 0.0051 | |
| … | … | |||||||||||||
| 10000 | 9998.9470 | 107 | 0.01% | 2969 | 0.30% | 0.0009 | 10000 | 10002.9656 | 87 | 0.01% | 3356 | 0.34% | 0.0010 | |
| No. | Pattern | Probability | No. | Pattern | Probability | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | (0,0,0,1,1,1) | 7 | 1 | 1.204 | 11 | (1,0,0,0,1,1) | 35 | 11 | 2.055 | |||
| 2 | (0,0,1,0,1,1) | 11 | 2 | 3.144 | 12 | (1,0,0,1,0,1) | 37 | 12 | 1.287 | |||
| 3 | (0,0,1,1,0,1) | 13 | 3 | 1.848 | 13 | (1,0,0,1,1,0) | 38 | 13 | 2.417 | |||
| 4 | (0,0,1,1,1,0) | 14 | 4 | 2.262 | 14 | (1,0,1,0,0,1) | 41 | 14 | 1.834 | |||
| 5 | (0,1,0,0,1,1) | 19 | 5 | 1.949 | 15 | (1,0,1,0,1,0) | 42 | 15 | 2.073 | |||
| 6 | (0,1,0,1,0,1) | 21 | 6 | 1.631 | 16 | (1,0,1,1,0,0) | 44 | 16 | 2.572 | |||
| 7 | (0,1,0,1,1,0) | 22 | 7 | 1.705 | 17 | (1,1,0,0,0,1) | 49 | 17 | 2.204 | |||
| 8 | (0,1,1,0,0,1) | 25 | 8 | 2.151 | 18 | (1,1,0,0,1,0) | 50 | 18 | 1.824 | |||
| 9 | (0,1,1,0,1,0) | 26 | 9 | 1.551 | 19 | (1,1,0,1,0,0) | 52 | 19 | 2.499 | |||
| 10 | (0,1,1,1,0,0) | 28 | 10 | 2.297 | 20 | (1,1,1,0,0,0) | 56 | 20 | 1.694 |
| Item | Parameters of a node | ||||||
| Tianhe-2 | Local Cluster | ||||||
| With Accelerators | Without Accelerators | ||||||
| Peak Performance | 3.43Teraflops | 422.4Gigaflops | 201.6Gigaflops | ||||
| Processors: | CPU | Intel Xeon E5 2 (24 cores) | Intel Xeon E5 2 (24 cores) | Intel Xeon E5 2 (12 cores) | |||
| Accelerators | Intel Xeon Phi 3 (171 cores) | ||||||
| Memory Storage Capacity | 72GB | 64GB | 16GB | ||||
| Interconnect Network | TH Express-2 | TH Express-2 | InfiniBand | ||||
| Scale | Platform | Rate(Hz) | (s) | (s) | (s) | (s) | ||||
| 20p.400m. | Tianhe-2 | 4 | 500,000 | 152.03 | 3288.88 | 0.00658 | 3185.82 | 0.00637 | 96.87% | 0.0097 |
| 25p.625m. | Tianhe-2 | 32 | 200,000 | 22.51 | 8884.56 | 0.04442 | 8804.99 | 0.04402 | 99.10% | 0.0089 |
| 30p.900m. | Tianhe-2 | 64 | 20,000 | 1.01 | 19836.72 | 0.99184 | 19825.83 | 0.99129 | 99.95% | -0.0162 |
| 3p.9m. | Cluster | 1 | 1,000,000 | 4263.29 | 234.56 | 0.00023 | 226.51 | 0.00023 | 96.57% | 0.0018 |
| 4p.16m. | Cluster | 1 | 1,000,000 | 4163.90 | 240.16 | 0.00024 | 229.83 | 0.00023 | 95.70% | -0.0005 |
| 5p.25m. | Cluster | 1 | 1,000,000 | 4082.18 | 244.97 | 0.00024 | 231.99 | 0.00023 | 94.70% | 0.0005 |
| 6p.36m. | Cluster | 1 | 1,000,000 | 3919.72 | 255.12 | 0.00026 | 238.02 | 0.00024 | 93.30% | 0.0036 |
| 7p.49m. | Cluster | 1 | 1,000,000 | 3795.73 | 263.45 | 0.00026 | 242.39 | 0.00024 | 92.01% | 0.0014 |
| 8p.64m. | Cluster | 8 | 1,000,000 | 3491.04 | 286.45 | 0.00029 | 257.62 | 0.00026 | 89.94% | 0.0023 |
| 9p.81m. | Cluster | 8 | 1,000,000 | 3429.28 | 291.61 | 0.00029 | 272.66 | 0.00027 | 93.50% | 0.0035 |
| 10p.100m. | Cluster | 8 | 1,000,000 | 3195.79 | 312.91 | 0.00031 | 290.86 | 0.00029 | 92.95% | 0.0045 |
| 11p.121m. | Cluster | 32 | 1,000,000 | 2780.55 | 347.64 | 0.00035 | 296.36 | 0.00030 | 85.25% | 0.0035 |
| 12p.144m. | Cluster | 32 | 1,000,000 | 2897.39 | 357.68 | 0.00036 | 326.69 | 0.00033 | 91.34% | 0.0045 |
| 13p.169m. | Cluster | 32 | 1,000,000 | 2818.12 | 413.30 | 0.00041 | 371.23 | 0.00037 | 89.82% | 0.0058 |
| 14p.196m. | Cluster | 32 | 1,000,000 | 2813.13 | 537.99 | 0.00054 | 489.10 | 0.00049 | 90.91% | 0.0041 |
| 15p.225m. | Cluster | 32 | 1,000,000 | 2133.86 | 789.08 | 0.00079 | 731.82 | 0.00073 | 92.74% | 0.0057 |
| 16p.256m. | Cluster | 32 | 1,000,000 | 800.30 | 1252.05 | 0.00125 | 1183.67 | 0.00118 | 94.54% | 0.0064 |
| 17p.289m. | Cluster | 32 | 1,000,000 | 317.46 | 3156.84 | 0.00316 | 3074.90 | 0.00307 | 97.40% | 0.0063 |
| 18p.324m. | Cluster | 32 | 1,000,000 | 159.97 | 6251.32 | 0.00625 | 6156.73 | 0.00616 | 98.49% | 0.0072 |
| 19p.361m. | Cluster | 32 | 1,000,000 | 78.94 | 12667.81 | 0.01267 | 12558.42 | 0.01256 | 99.14% | 0.0049 |
| 20p.400m. | Cluster | 32 | 1,000,000 | 37.98 | 26326.21 | 0.02633 | 26199.45 | 0.02620 | 99.52% | 0.0067 |
| 21p.441m. | Cluster | 32 | 1,000,000 | 18.09 | 55293.32 | 0.05529 | 55146.15 | 0.05515 | 99.73% | 0.0081 |
| GHz | MHz | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.55 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | 1 | 0.60 | 0.65 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | 1 | |
| 15 | 12 | 10 | 8 | 8 | 7 | 7 | 6 | 6 | 6 | 44 | 32 | 26 | 22 | 19 | 17 | 16 | 15 | 14 | |
| 45 | 29 | 22 | 18 | 16 | 14 | 13 | 12 | 11 | 11 | 69 | 49 | 39 | 33 | 29 | 26 | 24 | 22 | 20 | |
| 30 | 17 | 12 | 10 | 8 | 7 | 6 | 6 | 5 | 5 | 25 | 17 | 13 | 11 | 10 | 9 | 8 | 7 | 6 | |
| GHz | MHz | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | 1 | 0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | 1 | |
| 11 | 9 | 8 | 7 | 7 | 6 | 6 | 59 | 39 | 30 | 25 | 21 | 19 | 17 | |
| 34 | 25 | 20 | 17 | 15 | 14 | 13 | 99 | 64 | 48 | 40 | 34 | 30 | 27 | |
| 23 | 16 | 12 | 10 | 8 | 8 | 7 | 40 | 25 | 18 | 15 | 13 | 11 | 10 | |
| Network | ||||
|---|---|---|---|---|
| Repetition Rate | GHz | MHz | GHz | MHz |
| 48.81% | 53.67% | 60.41% | 66.42% | |
| 51.32% | 56.43% | 63.52% | 69.83% | |
| - | 2.51% | 2.76% | 3.11% | 3.41% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Sample Caching Markov Chain Monte Carlo Approach to Boson Sampling Simulation
Yong Liu
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Min Xiong
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Chunqing Wu
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Dongyang Wang
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Yingwen Liu
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Jiangfang Ding
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Anqi Huang
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Xiang Fu
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Xiaogang Qiang
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
National Innovation Institute of Defense Technology, AMS, 100071 Beijing, China
Ping Xu
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Mingtang Deng
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Xuejun Yang
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Junjie Wu
Institute for Quantum Information & State Key Laboratory of High Performance Computing, College of Computer, National University of Defense Technology, Changsha 410073, China
Abstract
Boson sampling is a promising candidate for quantum supremacy. It requires to sample from a complicated distribution, and is trusted to be intractable on classical computers. Among the various classical sampling methods, the Markov chain Monte Carlo method is an important approach to the simulation and validation of boson sampling. This method however suffers from the severe sample loss issue caused by the autocorrelation of the sample sequence. Addressing this, we propose the Sample Caching Markov chain Monte Carlo method that eliminates the correlations among the samples, and prevents the sample loss at the meantime, allowing more efficient simulation of boson sampling. Moreover, our method can be used as a general sampling framework that can benefit a wide range of sampling tasks, and is particularly suitable for applications where a large number of samples are taken.
I Introduction
Demonstrating quantum supremacy is a milestone in quantum computing, representing that quantum devices can outperform the fastest classical hardware on some task Preskill2012 ; Arute2019 ; Spagnolo2018 ; Nielsen2002 . Evaluating the demonstration of quantum supremacy should base on the intense competition between the two sides: the developing quantum devices on some selective tasks, and the classical computers running the benchmark for the corresponding tasks to explore the supremacy threshold. The chosen task should be suitable for near-term implementation as well as able to be significantly accelerated by quantum computing.
Such a task is boson sampling Aaronson2011 . On the one hand, its linear optical implementation merely requires identical bosons (typically photons), linear transformation, and passive detection. On the other hand, its classical simulation involves computing permanents of Gaussian complex matrices Scheel2008 , which is likely to be classically intractable, even in approximation cases Valiant1979 . The classical hardness of boson sampling attracts enormous efforts to build large-scale physical devices to beat classical computers, and remarkable achievements have been made Spring2013 ; Broome2013 ; Tillmann2013 ; Spagnolo2014 ; Lund2014 ; Bentivegna2015 ; Seshadreesan2015 ; Latmiral2016 ; Zhong2018 ; Paesani2019 ; Wang2019 . It is promising to show quantum advantage via boson sampling.
On the classical side, various sampling approaches have been raised for simulating boson sampling. The direct computation of the whole distribution of boson sampling is restricted within small scales because of the exponential growth of the state space. This stimulates a variety of methods for sampling from an exponentially large state space by calculating a few number of probabilities, such as the Clifford-Clifford (C&C) sampling method Clifford2018 , the rejection sampling method Villalonga2019 and the Metropolised independent sampling (MIS) method Neville2017 ; Liu1996 . Among the various sampling approaches, the MIS method allows to generate an effective sample by evaluating only a constant number of probabilities. More importantly, it can serve as a general sampling framework for far-ranging applications.
However, similar with the optical implementation of boson sampling, the sample loss is also a serious problem of classical samplers, as shown in Fig. 1(a). Current sampling methods generate massive candidate samples with each involving the evaluation of a probability, but only a small fraction of candidates are kept as effective samples, as shown in Fig. 1(b). For the MIS method, it has to discard massive candidates in order to tackle the autocorrelation issue inherited from the Markov chain Monte Carlo (MCMC) method. To this point, we define the computational hardness limit of the sampling tasks as the complexity of computing one probability. As for boson sampling, it is the complexity of computing one permanent Aaronson2011 , and to reach this limit, it has to eliminate the overhead of the sampling methods. In this work, we propose a method that could reach this limit.
Our method, namely Sample Caching-Markov chain Monte Carlo (SC-MCMC), makes contributions on two sides. (1) On the classical side, the SC-MCMC sampler provides a general sampling framework which allows to generate one effective sample from averagely one candidate sample. SC-MCMC deals with the autocorrelation issue of MCMC following a similar manner to MIS, but caches the discarded candidate samples for further reuse. In this way, the SC-MCMC sampler avoids the sample loss, and allows much more efficient sampling than a MIS sampler. Particularly, the SC-MCMC method can be used for quantum-enhanced applications based on boson sampling such as simulating the molecular vibronic spectra Huh2015 , finding dense subgraphs Arrazola2018 or approximate optimization Arrazola2018_2 . Moreover, Our method can also be directly applied on other quantum supremacy candidates, such as the Gaussian boson sampling Hamilton2017 , the IQP circuit sampling Bremner2016 ; Shepherd2009 and random quantum circuit (RQC) sampling Boixo2018 ; GuoLiu2019 , especially when a lot of samples are taken for validating the device Hangleiter2019 . (2) On the quantum side, our results suggest that the hardness of demonstrating quantum supremacy by boson sampling is increasing with the development of classical simulator. Based on the results of the simulation, we emphasize the importance of avoiding the photon loss in the optical implementation of boson sampling, showing that an improvement of 5% for the loss rate would spare hundreds of photons.
II Boson sampling and Markov chain Monte Carlo
The task of boson sampling is to sample the output distributions of an -mode interferometer network with identical photons as input. Because of the interferences of photons, the probability of a certain output pattern of photons is related to the permanent of the transformation matrix decided by the interferometer network. Specifically, the output patterns are post-selected within the “collision-free” regime where photons are no-bunching in each output port. The value of is often chosen to be to meet the requirement for the hardness proof of boson sampling, leading to that the number of all the possible output patterns grows exponentially with . In summary, the task of classically simulating boson sampling is to generate samples from the probability distribution over the output patterns.
Instead of calculating the probabilities of the whole distribution, a feasible method is Metropolis-Hastings algorithm, a kind of MCMC method Chib1995 . The sampling is processed by constructing a Markov chain, with the state space corresponding to the set of the possible output patterns of boson sampling, and the probabilities of all the patterns as the stationary probabilities. To expand the chain, a state is chosen as the candidate state (current state is ) following a easy-to-sample symmetric proposal probability distribution, and this candidate state would be accepted and added on the chain with probability
[TABLE]
where is the boson sampling probability of state , or be rejected with the rest probability. If the candidate is rejected, the current state would duplicate on the end of the chain. Each time a node is added in the chain, the pattern corresponding to the added node is outputted as a candidate sample. In this way, it seems ideal that one may just need to evaluating one probability for one sample.
Unfortunately, the generated samples may be erroneous because the generated sample sequence suffers from severe autocorrelation, as shown in Fig. 2(a). The first-order autocorrelation of a sequence is often estimated using Durbin-Watson statistic Durbin1971 , while in this paper, we follow a more straightforward way to estimate the autocorrelation by
[TABLE]
where is the value of the sample, is the mean of the samples. Each is assigned a value as the order of patterns after been sorted. The value of should be in , and it reflects the autocorrelation in the sequence. The closer is to 1, the stronger the samples are correlated, and the sign of indicates if the samples are positively or negatively correlated.
To overcome the autocorrelation of the sample sequence, in MIS, a method called “jump sampling” (or “thinning procedure”) is applied to obtain independent samples, as shown in Fig. 2(b). In jump sampling, an effective sample is kept in every candidates with the rest candidates discarded. The remained samples are approximately independent. The value of is determined by checking the autocorrelation of the obtained sequence. In MIS, the value of is 100, which is claimed to be sufficient for simulating boson sampling of more than 30 photons Neville2017 . However, this suggests that in MIS the ratio for keeping a candidate sample is . Other methods such as the delayed rejection Au2001 ; Tierney1999 ; Zuev2011 could be used to reduce the correlations among the samples at the cost of calculating more probabilities for one sample. If the autocorrelation issue could be tackled without abandoning any samples, the sampling process could be accelerated by 100 times.
III The sample caching method
The main scheme of the SC-MCMC protocol is shown in Fig. 2(c). It mainly contains two parts: one is a MCMC sampler providing candidate samples, with which the second part combined is a procedure that we call “sample caching”. In Fig. 2(d) we show the working processes of the SC-MCMC sampler, which can be divided into three phases:
The cache filling phase (CFP). In this phase, the SC-MCMC sampler follows the similar protocol to the MIS sampler: It outputs an effective sample in every candidates, where is the jump step used in the MIS sampler. The difference is that the unselected candidates are used to fill a sample cache till the cache is full. The sample cache can be completely filled after generating candidate samples, where is the size of the cache. In this stage it has to calculate probabilities for one effective sample; 2. 2.
The working phase (WP), in which the cache is full. For each time a candidate sample is generated by the MCMC process, randomly output a sample from the cache first and then store this new candidate into the cache. This procedure is repeated till the MCMC process generates candidates. Therefore in WP, the SC-MCMC approach drives efficient sample generation that an effective sample can be generated by calculating only one probability; 3. 3.
The cache cleaning phase (CCP). In this phase, the samples stored in the cache are outputted in random order without calculating any probabilities, and therefore there can be a burst of sample generation. This phase can be regarded as a compensation for the initial overhead of CFP, and result in that averagely the evaluation of every probability can support the generation of one effective sample.
Nevertheless, the first candidate sample can be directly outputted as an effective sample since there is no other sample that can be correlated with. As reflected from Fig. 2(d), if only a small number of samples are taken (i.e. ), the sampling process may stop in CFP, in this case the SC-MCMC sampler is actually executing the MIS protocol. Generally, in the final CCP, cleaning the cache provides a burst of sample generation. However, this step can be dangerous when the value of is not large enough because the burst output could result in sample correlation, and whether a sample can be kept is determined by how much it is correlated with the sample sequence. While if a large number of samples are taken, SC-MCMC can be times as fast as MIS.
It is worth noting that the generation of candidate samples discussed here is slightly different from that in Neville2017 where the proposal probability distribution is chosen as the distribution of distinguishable particles corresponding to the boson sampling instance. This choice can reduce the correlations between samples, but introduces extra computational cost of a real permanent. Here we choose the uniform distribution as the proposal distribution so that only one permanent is required to calculate for every candidate sample, while the autocorrelation can be eliminated by the sample cache without abandoning any candidates (as we will show in the following discussion). In this way, compared with the C&C sampler which has an approximately equivalent cost of calculating two permanents for one effective sample Clifford2018 , the SC-MCMC sampler can modestly obtain a 2-fold improvement over the C&C sampler when a large number of samples are taken.
Similar to MIS, SC-MCMC is also case independent. Besides boson sampling, SC-MCMC can be directly used as a general sampling framework for any sampling tasks as long as we can evaluate the probabilities for given events, for example the Gaussian boson sampling Hamilton2017 , IQP circuit sampling Bremner2016 ; Shepherd2009 and random quantum circuit sampling Boixo2018 ; GuoLiu2019 . Meanwhile, SC-MCMC is particularly suitable for tasks where a large number of samples are taken.
Essentially, the correctness of SC-MCMC is guaranteed by the MCMC process. Some methods can be applied to validate the sampling results Aaronson2014 ; Bentivegna2014 ; Agresti2019 , but a more straightforward way is to compare the frequency graph with the probability distribution. We found an empirical result that the number of samples may has to be in the order of to construct a frequency graph with a similarity of 99% with the probability distribution.
More importantly, the sample caching process eliminates the autocorrelations within the sample sequence without losing any candidate samples. Under the asymptotic condition where the size of the cache is large enough, the number of samples stored in the cache follows the probability of the samples, and the uniformly random choice makes it independent among the draws. Practically, the correlations among the samples can be eliminated with a sample cache in a limited size. In Fig.3(a) we plot the first-order autocorrelation against the size of sample cache for different scales of boson sampling. In Fig.3(b) we plot the autocorrelation of the sample sequences obtained from different approaches in simulating an instance of boson sampling with 30 photons and 900 modes, where the “lag” indicates the orders of the autocorrelations. In the sample sequence from the original MCMC approach, we can still find remained correlation between two samples even in lag 100 owing to the improper choice of the proposal probability distribution (see supplementary for details supplementary ). In comparison, the SC-MCMC sampler with can reduce the autocorrelation at every order to a quite low level, and a cache with can approximately generate independent samples, even though the proposal distributions in the SC-MCMC and the MCMC samplers are the same.
To understand how sample cache works, we analyze the jump sampling method first. For the MCMC sampler with state space , the transition probabilities can be described by a matrix with equals the probability to transit from state to in one step. The -step transition probability matrix is , and is the probability to transit from to in steps. It satisfies that for arbitrary and , where is the probability of state Liu1996 ; Gagniuc2017 ; Wan2016 . Thus a sample is independent from another that is infinite steps away. Actually, in a certain number of steps (e.g. steps), the -step transition probability approximately equals to for arbitrary and , which means that a sample is approximately independent from the samples with a distance of more than steps. The value of depends on how fast the Markov chain converges Schmitt2001 . Empirically, we can choose a sufficiently large value. If the samples between the steps are discarded, the correlations among the remained samples are negligible.
In SC-MCMC, the result can be regarded as a reordered sequence from the original one. The probability that the two adjacent samples are at a distance of steps in the original sequence is
[TABLE]
where is the size of the sample cache. Then we can obtain the probability that two adjacent samples are still correlated (the distance between the two samples in the original sequence does not exceed , the corresponding jumping step in jump sampling) by
[TABLE]
Clearly, , and the first-order autocorrelation is thus eliminated if the sample cache is sufficiently large. The autocorrelations of higher orders are eliminated in the same way supplementary .
The next question is that practically what size should the cache be? The exact size of the sample cache required varies case-by-case, however we can choose a sufficiently big cache. This answer is easy to obtain from Eq.(4) by ensuring , then . Setting and , resulting in , is sufficient for the case of and . It is supposed to be sufficient for boson sampling with larger scale, and the size of cache could be further enlarged if needed. Most significant of all, no sample is lost no matter what size the cache is, which suggests that our method reaches a time complexity of simulating boson sampling to .
IV Numerical simulation
We demonstrate our method for boson sampling simulation by implementing the parallelized Glynn’s algorithm to exactly compute the permanents of arbitrary square matrices Glynn2010 . The numerical simulations were done on 64 nodes of Tianhe-2 supercomputer Liao2014 ; Liao2014_2 or another local cluster, as shown in Tab. 1. The sampling rate reached 1.01Hz on Tianhe-2 nodes when simulating boson sampling with 30 photons, and the 32 cluster nodes afforded the simulation for 21 photons with a sampling rate of 18.09Hz. The percentage of the time used for calculating permanents exceeds 99% when is sufficiently large.
Combined with the performance estimation on Tianhe-2 supercomputerWu2018 , the average time estimated (in seconds) for obtaining a -photon candidate sample is if we are allowed to simulate the generation of a large number of samples. This scaling result indicates that the average time estimated for a 50-photon sample can be reduced from about 10 days to within 100 minutes. To compare the performance of the simulator with the physical setups, we use the function defined in ref. Neville2017
[TABLE]
where is the estimated classical runtime for a boson sampling instance with photons and modes. is the quantum runtime for the corresponding instance in which is the -photon repetition rate of the photon source, and is the transmission probability for a photon supplementary . suggests that the performance of a quantum boson sampling setup surpasses that of Tianhe-2 supercomputer.
In Fig. 4 we plot four lines corresponding to under different combinations of classical approaches and physical setups. Associated with the transmission probability realized in recent experiments ( Spring2013 ; Broome2013 ; Tillmann2013 ; Spagnolo2014 ; Bentivegna2015 ; Zhong2018 ; Wang2019 ), the increment of photon number required to reach a corresponding performance is vast. For example, when and (probably beyond the reach of near-term experiments) the increment exceeds 300 photons, while it reduces to 30 when is improved to 0.55. Further, for a network with given shape, there exist the minimum request for , even when we have sufficiently many photons supplementary . Our results suggest that currently, enhancing the transmission probability can greatly reduce the required photon number for quantum advantage.
V Conclusion
In summary, we proposed a classical sampling approach which can be used to simulate boson sampling. Our method tackles the autocorrelation issue which leads to the sample loss in MIS, and can be 100 times as fast as MIS if a large number of samples are taken. For cases where only a small number of samples are required, our method can perform as well as MIS. Therefore our method enables more efficient simulation of boson sampling, especially in the cases where boson sampling is applied for quantum-enhanced problems. Above all, our method is a general sampling framework and could support customized optimization for specific tasks to gain further advanced efficiency.
The progress in classical simulators can be challenges to physical implementations, which however could provide some instructive feedback on physical experiments. The comparison between classical computing and quantum boson sampling suggests that currently, reducing photon loss may approach closer to quantum advantage, and is perhaps more important than merely enlarging the scale of boson sampling devices.
Acknowledgements.
We gratefully acknowledge the help from China Greatwall Technology in Changsha, and the National Supercomputer Center in Guangzhou. We appreciate the helpful discussion with other members of QUANTA group. J. W. acknowledges the support from the National Natural Science Foundation of China under Grant 61632021. P. X. acknowledges support from National Natural Science Foundation of China under Grants No. 11621091 and No. 11690031. X. Q. acknowledges support from National Natural Science Foundation of China under Grants No. 11804389.
VI acknowledgments
We gratefully appreciate the kindly support from China Great wall Technology in Changsha. We also appreciate the helpful discussion with other members of QUANTA team. J. W. acknowledges the support from National Natural Science Foundation of China under Grants No. 61632021. P. X. acknowledges the support from National Natural Science Foundation of China under Grants No. 11621091 and 11690031. X. Q. acknowledges the support from National Natural Science Foundation of China under Grants No. 11804389.
**Supplementary Information:
Sample Caching Markov Chain Monte Carlo Approach to Boson Sampling Simulation**
I Boson Sampling and Quantum Supremacy
Boson sampling is a process that by injecting bosons (typically photons) into the -mode interferometer, and measuring the output distributions of the photons one obtains samples from a boson sampling device. The patterns of the output photons form a complicated probability distribution, as described by Eq. S1
[TABLE]
where and are two -dimension vectors describing the input pattern and output pattern of the photons respectively. Specifically, with meaning there are photons in the input port of the interferometer, and is defined for the output ports in the same way. In optical implementations, and are often described by multi-mode number states. The standard input state is , which is also applied in our simulation. is a sub-matrix by choosing different rows and columns from the transformation matrix that is decided by the interferometer [1]. is the permanent of the matrix given. For a matrix , the permanent of is defined as
[TABLE]
where ranges over all the permutations of .
The exact simulation of boson sampling is hard. While in the case of approximation, the hardness proof of boson sampling is based on a conjecture that the permanent of the matrix with Gaussian elements is hard to approximate unless the polynomial hierarchy collapses. While to ensure the Gaussian feature of the -submatrix, the transformation matrix should be Haar random, and is in the order of [1]. Generally, is thought to be sufficient.
Totally, there are up to output patterns of the photons. However, we care more about the “collision-free” cases, where photons don’t share the same mode, i.e. no bunching, thus reducing the number of patterns to . The simulation of boson sampling is to draw samples from the distribution described by Eq. S1. It’s unavoidable to calculate or approximate permanents, thus the simulation of boson sampling is likely to be hard on classical computers. The two most-efficient permanent-computing algorithms, Ryser’s algorithm and Balasubramanian-Bax-Franklin-Glynn’s algorithm are in the time complexity of if implemented to pursue the efficiency of parallelism on the target platform with massive parallel processing unit. Though it’s relatively easier to achieve the time complexity of when executed serially.
Since boson sampling is hard to simulate on classical computers, it’s promising to realize the quantum supremacy through boson sampling, and the benchmark of boson sampling for the supremacy is then raised for discussion. Ref. [2] firstly proposed a practical boson sampling simulator through Metropolised independent sampling (MIS), and concluded that the server-level computing hardware is sufficient for 30-photon simulation, and the performance of supercomputer could further afford 50-photon boson sampling [3]. With a detailed comparison between the performance of MIS and currently proposed experiments, it’s concluded that the supremacy has not been achieved yet, and seems unlikely in the near term.
II The Computational Hardness Limit of Classically Simulating Boson Sampling
In [3], the benchmark on one of the fastest supercomputer, Tianhe-2, suggests that the performance limit of classical computers on boson sampling is to generate a 50-photon sample in about 100 minutes. This sampling rate is obtained on the assumption that the performance of generating a sample could be represented by that of computing a permanent. The computational hardness limit is thus defined in this way as the performance of calculating only one permanent, which asks for that the number of permanents required for one sample should be only one.
The calculation of a permanent is the core problem of classical simulating boson sampling. However, between the calculation of permanent and the generation of a sample, there is a gap caused by sampling algorithms. Current sampling algorithms are not able to reach this computational hardness limit. To generate a sample, often it has to calculate more than one probability, and thus involves calculating more than one permanents, as shown in Fig. 1(b) in main text. For example, the number of permanents required for one sample is a constant (=100) in MIS [2], and for brute force sampling, this number is . If an algorithm reaches this limit, the extra cost other than computing one permanent for one effective sample is negligible, and the performance of computing a permanent can represent that of simulating boson sampling. Here we briefly introduce three widely used classical sampling algorithms.
II.1 Brute force sampling
Brute force sampling generates samples in a more straightforward way: compute the probabilities of all the output patterns, and then draw samples. Thus it’s also called naïve sampling. However, even to generate only one sample, brute force sampling has to calculate permanents in the collision-free regime. Since is required to be , the number of probabilities of the distribution and the corresponding number of permanents required to calculate grows exponentially with (Fig. 1(b) in main text). It’s obvious that the brute force sampling method is no longer feasible when reaches a certain value. For example, when , the quantity of permanents required to calculate is , and when reaches 10, this number grows explosively to , which seems unacceptable on a classical computer. The other problem of brute force sampling is the storage of the whole distribution. If a single probability is stored in the format of a double precision float number, which occupies 16 bytes on a classical computer, then the storage request for the distribution of a boson sampling instance with 10 photons in 100 modes exceeds 277 Terabytes, and it requires about 156.9 Petabytes for simulating a 30-photon-900-mode instance of boson sampling. Thus brute force sampling is far away from reaching the computational hardness limit of classically simulating boson sampling.
II.2 Rejection sampling
Compared to brute force sampling, rejection sampling requires to compute much less permanents for one sample. All what needed is a proposal distribution that can be efficiently sampled. Before sampling, a parameter must be decided, so that for any output pattern of boson sampling, , where is boson sampling probability for pattern and is the corresponding proposal probability. The scheme of rejection sampling is shown in Fig. S1.
The process of rejection sampling repeats the following two steps:
randomly sample a pattern from ; 2. 2.
calculate , and then output the sample of pattern with probability , or discard this sample with probability .
Thus for each sample, it’s a probabilistic event that the sampler outputs the sample or not, and finally the distribution we draw samples from approaches the target probability distribution. However, it’s hard to choose a proposal distribution that overlaps quite well with the boson sampling distribution. A feasible way is to choose the uniform distribution, and let satisfy that after multiplying , this uniform distribution exactly covers the largest probability of the output patterns in boson sampling, in this case the boson sampling distribution itself has great influence on the success probability, and it brings in another problem: the estimation of the largest probability of boson sampling. Ref. [2] discussed about the quantity of permanents used for a sample in rejection sampling. Actually the average probability to accept a sample with uniform proposal is , and the computational hardness core problem is repeated for times for one effective sample. However, though much less than brute force sampling, the number of permanents required to calculate for one effective sample also grows very fast (Fig. 1(b) in main text). In conclusion, rejection sampling is also far from the computational hardness limit of classically simulating boson sampling.
II.3 Markov chain Monte Carlo
In Markov chain Monte Carlo (MCMC), the sampling process is done through constructing a Markov chain with the target distribution as the stationary distribution. The state space, denoted as , of the Markov chain corresponds to all the output patterns of boson sampling. Each time the chain expands, a sample could be generated, as shown in Fig. S2. Since we limit the sample space in the collision-free case, the size of the state space is , with each state representing one output pattern of photons.
To be specifically, the algorithm we use is Metropolis-Hastings algorithm with symmetric proposal distribution which is easy to sample from. The process of the Metropolis-Hastings algorithm is the iteration of the following three steps:
If current state is , choose the next state from distribution , which is the probability to choose as the candidate when current state is . The choice of the proposal distribution is arbitrary. For symmetric proposal, which is frequently used, it satisfies . 2. 2.
Calculate , which is the boson sampling probability for state , and accept as the new state with probability , or duplicate as the new state with the rest probability; 3. 3.
Output the sample corresponding to the new state.
Note that in step 2, the probability for acceptance would not always be , because for other Metropolis-Hastings algorithms, the symmetric condition of the proposal density is not necessary, expanding the probability to be . As we will see in the following sections, the choice of proposal density would have influence on the convergence speed of the Markov chain.
Unlike the rejection sampling, the MCMC method will generate a sample even the candidate sample is rejected. Therefore, it seems ideal that for one sample, we just need to calculate one permanent, and thus the computational hardness limit could be reached. However, the sample generated may be erroneous of the correlations between samples. Usually the correlations are eliminated at the cost of impairing the sampling efficiency, and makes it away from the computational hardness limit of classically simulating boson sampling. Here we developed an algorithm to avoid the efficiency impairment while reduce the correlations between samples. We next analyze the autocorrelation problem in MCMC.
III The Sample Caching Markov Chain Monte Carlo Method
We develop an algorithm and try to reach the computational hardness limit by eliminating the autocorrelation of the sequence generated by the MCMC without abandoning samples. To see this, we will introduce the autocorrelation problem in MCMC first.
III.1 The autocorrelation of the sample sequence generated by Markov chain
The autocorrelation of the sample sequence is the fact that the samples are correlated, and thus may lead to erroneous samples. The autocorrelation of the sequence at lag is defined by Eq. S3
[TABLE]
where is value of sample, and , are the mean and the variance of the distribution. Usually, the true mean and variance are unknown, and are often replaced by the sample mean and variance. The autocorrelation then can be estimated as
[TABLE]
where and are the mean and variance of the samples. Specially, for first-order autocorrelation, there are other test statistics, such as the Durbin-Watson statistic [4], as expressed by Eq. S5
[TABLE]
The relationship between -statistic and the first-order correlation can be given by . We directly estimate the first-order autocorrelation through Eq. 2(main text), which is the special case of Eq. S4 by assigning . The value of should be in , and it would reflect the autocorrelation in the sequence. The closer is to 1, the stronger the samples are correlated, and the sign of indicates if the samples are positively or negatively correlated.
From the definition of autocorrelation, we can analyze the autocorrelation problem. Given a sample sequence with as the sample taken from a distribution described by and if the samples are independent, it’s easy to observe that
[TABLE]
where is the probability for event . Note that since the samples are independent, for arbitrary and , and thus result in no autocorrelation for arbitrary .
In one step of state transition in Markov chain with state space , the probability of the transition from a state to another forms a matrix , with the element at row and column representing the probability to transit from state to , as shown in Eq. S7
[TABLE]
where is the probability for state and is the probability from the proposal distribution, which acts as the candidate selection probability. In the Markov chain Monte Carlo, this conditional probability is just the transition probability . However, in most cases, , which leads to the correlations between the adjacent samples. For the state-duplication cases, the adjacent samples are exactly identical, which leads to a severe autocorrelation problem and erroneous samples.
III.2 Jump sampling method
In Metropolised independence sampling [2], the autocorrelation is eliminated through multi-step transition, i.e. the jump sampling (or thinning procedure). The transition from to in steps can be described by matrix . It’s easy to observe that . If the Markov chain would converge, then for arbitrary and , [5, 6]. To show this, we define Averaged Total Variance Distance of between the -step transition probability matrix and the intrinsic probability of the states
[TABLE]
where is the size of the state space. indicates for arbitrary and . In the test, the proposal distribution is for arbitrary and . For example, the results of between for several scales are shown in Fig. S3.
Thus as long as the Markov chain would converge, for a certain number , the difference between and is negligible. By throwing away the samples within the steps, the remained samples are approximately independently. Thus MIS generate an effective sample by calculating permanents. A proposed value of is 100, which is sufficient for eliminating the autocorrelation of the sample sequence from more than 30-photon boson sampling scheme, and is with only a constant number from the computational hardness limit of simulating boson sampling. The time estimated for Tianhe-2 supercomputer to generate a 50-photon sample is about 10 days. However, if the samples within the leap is not discarded and is reused, then the efficiency of the algorithm would be greatly enhanced, which is the main idea of our method.
III.3 Sample caching
We proposed the Sample Caching Markov Chain Monte Carlo(SC-MCMC) to reduce the autocorrelation without losing any samples. Our protocol mainly contains two parts: a complete MCMC sampler combined with a procedure that we call “Sample Caching”. The SC-MCMC algorithm is shown as Algorithm 1.
By applying sample caching on the sequence, the finally generated samples are nearly independent. Fig. S4 shows the influence of sample cache with varied size on the first-order autocorrelation, and Fig. S5 shows the comparison between the original sequence generated by normal MCMC algorithm and the SC-MCMC generated sequence in autocorrelation at lags up to 200.
The results indicate that, sample caching process reduces the autocorrelation of the sequence to a negligible level, and the samples in the sequence are nearly independent. More important, in this way, all the samples are saved from being discarded, and averagely the generation of one sample only requires the calculation of one permanent. The cache at the size of 4,000 is sufficient for 30-photon cases.
However, the SC-MCMC method requires a longer start-up time, because of the workload to fill the sample cache. If a MIS and a SC-MCMC sampler are working on a same sampling problem, after the same period of warm-up time, the MIS begins to output samples every steps, while the SC-MCMC has to wait for steps, and then output samples in every step. We would find that the SC-MCMC sampler would cache up with the MCMC sampler with jump sampling at the sample ordered with by the SC-MCMC sampler. Fortunately, we can use a small trick to improve the fall-behind period of SC-MCMC. The solution is more like a combination of the jump sampling and sample caching: when filling the sample cache, we can output samples in every generations, while the un-chosen samples are stored in the cache till the sample cache is full, and then follow the SC-MCMC protocol. Thus finally, we can develop a improved SC-MCMC algorithm, as shown in Algorithm 2. The warm-up period of MCMC is not focused in our work, because the number of samples for warm-up is constant and limited, while much more samples can be discarded in other sampling algorithms.
IV Theoretical Analysis of Sample Caching Markov Chain Monte Carlo
The effectiveness of sample cache in the asymptotic condition where the size of the cache is large enough is easy to understand. The standard MCMC process ensures that the samples in the cache follows their own probability. Since we uniformly randomly choose the sample from the cache as the sample to output, each selection is independent. Thus we obtain a correlation-less sequence.
Practically, the size of the cache is limited. In this case, the essence that sample cache works is the reorder of the samples in the sequence, as shown in Fig. S6. The samples generated by the MCMC sampler without entering the sample cache forms an original sequence (denoted as ), and the sequence after the sample cache is denoted as . The adjacent two samples from may be separated in , and the size of cache determines the range that the reorder works.
The reorder process is stochastic. Suppose that two adjacent samples in labeled as and , and reorder the sequence with a sample cache of size . Now we discuss about the probability that the distance of is in steps away from in the original sequence before the reorder.
If , it means is steps ahead of in .
In this case, enters the cache first, and in the next steps, is not chosen to output. This probability is . Then enters the cache, and they are outputted immediately with probability , or outputted one step later with probability , or with probability outputted steps later, where . Thus in summary, we have
[TABLE] 2. 2.
If , it means is steps behind in .
In this case, enters the cache first, and in the next steps is not chosen to output with probability . Then these two samples are probable to be outputted in steps, with probability , where . Thus the probability of distance is
[TABLE]
We care more about the absolute value of . The probability that the two adjacent samples are in distance is . The expected distance of two samples is
[TABLE]
In this way, the correlated samples are at a lower probability to be at a close distance in the new sequence. This probability is in a inversely proportional relationship with the size of the cache. The distance may directly impact on the correlation between the two samples. From Fig. S3 we see that if the distance reaches a certain number (denoted as ), the correlation of the samples can be ignored. So we can calculate the probability if the correlation of two samples can not be ignored as
[TABLE]
We can constraint in a low level to reduce the autocorrelation. That is, let , then . Next we discuss about the practical value of and .
The value of should depend on how fast the Markov chain could converge, and can be referred from the jumping step of the MIS. The choice of the easy-to-sample proposal distribution before the reject-accept phase in Markov chain may greatly affect the convergence speed. In our implementation, we tested two proposal distributions. One is the uniform distribution, which indicates that all the states are chosen uniformly randomly as the candidate state no matter what the current state is. The other one generates candidate state by randomly moving one of the photons to another empty mode, as shown in Eq. S13
[TABLE]
where is the number of photons and is the number of modes. For example, in the simulation for , state corresponding to pattern may transmit to state with pattern by moving the photon from the second mode to the last mode. This proposal distribution leads to slower convergence of the Markov chain, and thus exposes more severe autocorrelation problem in our test. Practically, the uniform distribution is preferred. For example, as shown in Fig. S7, the curves are obtained from the sequences obtained using different candidate choosing strategy. “mov1p” is the implementation of Eq. S13; “Uniform” means the candidate is chosen uniformly randomly.
Further, we compared the two strategies with that in ref. [2] where the distribution of distinguishable particles is used as the proposal distribution, as shown in Fig. S8. The strategy in ref. [2] helps produce least correlated samples, and thus a smaller sample cache is sufficient, but introduce the cost of computing an extra real permanent. With the increase of the size of the sample cache, the autocorrelations of the sequences under different proposal distribution become closer, and are hard to distinguish when the size of the cache reaches a certain level. Thus we claim that no matter what strategy is applied, the sample cache would finally help eliminate the autocorrelation as long as the Markov chain can finally converge, while the difference resides in the size of the sample cache required.
On the other hand, the strategy in ref. [2] produces least correlated samples, the sampler with this proposal distribution must be able to sample from the distribution of distinguishable particles efficiently, which involves the calculation of the permanent of a real matrix. In this way, two permanents (one of a complex matrix and one of a real matrix) have to be calculated for one sample. Although it has been proved that the permanent of a positive real matrix is easy to approximate [7], the practical calculation on classical computers still requires more extra cost.
With proper proposal distribution, it’s claimed that is sufficient for the cases with more than 30 photons. However, unbefitting choice of proposal distribution would lead to higher requirement of the size of the sample cache, while the performance of SC-MCMC would not be affected that much, except a longer start-up time. When the cache is filled, the SC-MCMC sampler still can generate one sample by calculating one permanent. Because of the possibility of bad convergence speed when choosing improper proposal distribution, we set .
Next we discuss about the choice of , and then we can use a sufficient big cache to eliminate the autocorrelation. Table SI gives the examples.
It meets some challenges to relate directly with the autocorrelation of the sequence. Empirically we found that may be enough to limit the first-order autocorrelation within a small range (in the order of ). Thus we claim that may be sufficient, and then a practical value of can be obtained by setting and , that is 4,000. As reflected in Fig. S5, the SC-MCMC with a cache of size 4,000 can produce nearly independent samples for 30-photon-900-mode boson sampling. Above all, no matter how big the cache is, no sample is discarded.
V Numerical Results
V.1 The value assigning of the output patterns
To calculate the autocorrelation within the sequence, each output pattern of photons has to be assigned a value. We tried 3 ways for value assigning.
Binary to Decimal
Since the regime that we simulate boson sampling is limited in the collision-free case, the number of photons in each mode would not exceed 1. Thus the pattern can be regarded as a binary number. We can simply transform it into a decimal number as its value. However, this method is not feasible for larger scale because of the limit of the word length of classical computers. 2. 2.
Order of Sort
We first sort the patterns, and use the numerical order as its value. 3. 3.
The value is assigned as the logarithm of its probability. This method is also applied in ref. [2].
For example, when simulating the boson sampling with 3 photons and a 6-mode optical network, in the collision-free regime there are totally 20 output patterns, and the value assigning of the 20 patterns are shown in Tab. SII. The assigning strategy doesn’t impact on the final result, and the theoretical analysis works no matter what method we use to assign values to the patterns. Figure S9 shows examples for two different assigning strategies. In our simulation, we use the means of “Order of sort” to assign value to each output pattern.
V.2 Correctness
Since SC-MCMC essentially is only the reorder of the sequence, the samples taken are not changed. The standard MCMC process embedded ensures the correctness of the results. Current validation techniques provide the method to distinguish boson sampling distribution from another proposal distribution once. Basically the sampling results in boson sampling should be compared to the uniform distribution and the distribution sampling from distinguishable particles, particularly in the cases where the scales reaches a rather large level. Here we directly compare the frequency of the sampling results to the theoretical probability distribution, and calculate the similarity following Eq. S14
[TABLE]
where and represent the probability of event labeled as in distribution and respectively.
Empirically, we found that to reach a high similarity between the frequency of the sampling results and the probability distribution, the number of samples taken should be around . Thus it limits the scales of simulation if we only take 1,000,000 samples. As have shown in Fig. S10, the sampling results agree the theoretical probability results well. In small scales we have confirmed the correctness by comparing the frequency graph with the probability distribution, this validation method is not feasible in relatively large scale, which requires corresponding computing resource with the brute force sampler. In large scale simulation, other validation method is required, such as the likelihood ratio test.
If the correctness of the classical sampler is admitted, the classical sampler further provides an approach for the validation of the physical experiments. However, the hardness of boson sampling makes it difficult to validate the experimental results since it requires to calculate exponential permanents to provide theoretical probability distribution, which is exactly what the brute force sampler does. Another method that may help, which is kind of speculative, could be like this: We could obtain two sample sequences, one is from the classical sampler that could be trusted, the other is from the experimental boson sampler, and use some statistic techniques to validate, such as the - test.
V.3 The influence of sample cache on high-order autocorrelation
The essence of SC-MCMC is the reorder of samples, so that the correlated samples are scattered to be apart in varied distances. The range of scattering depends on the size of the sample cache. Thus the low-order autocorrelation decreases with the cost that the high-order autocorrelation increases. Until the size of the sample cache reaches a certain degree, the space for the scattering of low-order autocorrelation is large enough, thus the low-order autocorrelation can be reduced with high-order autocorrelation increasing by a negligible quantity. The results of autocorrelation scattering are shown in Fig. S11.
The process of scattering can be observed with the increase of the size of sample cache. The curve indicated by shows the initial autocorrelation at all the lags. When , the low-order autocorrelation decreases (e.g. lag from 1 to 20), while the autocorrelations at lags greater than 20 increase. When reaches 100, the curve tends to be flat, and reaching 500 makes it flatter. When arrives at 4,000, the autocorrelation at all the lags is negligible, therefore we obtain a sequence containing nearly independent samples.
V.4 Performance
Here we show the performance of our SC-MCMC. The test platforms are Tianhe-2 supercomputer, and another small-scale local clusters. The simulations only take advantages of the CPUs of Tianhe-2 without the usage of accelerators. The performance parameters of each computing nodes of Tianhe-2 (the corresponding parameters of the accelerators are excluded) are listed in Tab. SIII. The sampling results are shown in Tab. SIV. The largest number of nodes used on Tianhe-2 is 64, and the Intel Xeon Phi accelerators are not used, therefore it is still a server-level cluster.
The percentage of time spent on permanents approaches 100% with the scale grows. Though with the increase of number of computing nodes, the extra cost for the initialization of the computation also increases. However, compared to the computation of permanents, this is a negligible cost when the number of photons reaches a certain number. For example, when using 32 nodes, the percentage of time on spent on permanents reaches nearly 100% when the number of photons reaches 21, while the average time on computing one permanent is only 0.055 seconds. Thus we conclude that SC-MCMC can reach the computational hardness limit of classically simulating boson sampling.
VI Numerical simulation
The precise estimation must base on some test first. In [3], they have measured the scalability curve of calculating permanents on Tianhe-2 supercomputer. Our algorithm makes it feasible to regard the performance of calculating permanents as the performance of simulating boson sampling, which reveals the scaling in terms of (the number of photons) and (the number of computing nodes) respectively, based on the assumption that we are allowed to take a lot of samples. By fixing , the scaling of is described by Eq. S15, and by fixing , the scaling of is shown in Eq. S16
[TABLE]
[TABLE]
The scalability curve of simulating boson sampling on Tianhe-2 are shown in Fig. S12.
As reported in [2], the MIS could generate a 50-photon sample on the supercomputer within under 10 days, while in a corresponding time (within about 11 days), our method could be able to generate a 57-photon sample.
To make a more detailed comparison with the physical setups, we rewrite the function of quantum advantage defined in [2] as
[TABLE]
where is the number of photons, is the transmission probability of a photon, which is the key to the sampling efficiency of a physical realization. , represent the estimated time for the simulation of boson sampling instance on classical computers and quantum computers respectively. The value of function represents the competition between classical computers and quantum computers. The quantum advantage exists in the situations with where quantum devices are faster than classical computers, and the border line of is the balance between the classical device and quantum physical setup. The estimated quantum run time is
[TABLE]
where is the probability for a collision-free event, which depends on the size of the network. It can be further refined for a square network with or a linear network with which both have been experimentally realized, as Eq. S19 shows.
[TABLE]
where in and in are the approximation to in corresponding cases. takes the value of GHz, which is beyond the reach of the experimentally demonstrated photon sources. The leading parameter of proposed photon source is [8]. Here we update the as
[TABLE]
where is the estimated classical run time for an instance of size bosons in modes on Tianhe-2 supercomputer. Here, Fig. S13 shows the performance comparison quantum runtime as .
With the same experimental techniques, which decides the value of , the number of photons required for positive quantum advantage is increased, as shown in Tab. SV for squared network and in Tab. SVI for the linear network. Currently is less than 0.4 [9, 10, 11, 12, 13, 14, 8], and a small increment of could greatly reduce the number of photons required for positive quantum advantage.
The minimum value of is also raised if the number of photons is restricted within a reasonable value, say 100 photons. Tab. SVII gives the increment of under different experimental parameters.
If the number of photons is not limited, the minimum value of will still not approach 0. To see this, we write the classical runtime as
[TABLE]
where is the scaling coefficient of the algorithm, and because an efficient implementation of the permanent calculating algorithm has a time complexity of . The quantum runtime is referred from Eq. S18. Letting will give
[TABLE]
It converges when . For the square network, we have
[TABLE]
and for the linear network with , we have
[TABLE]
The possibly existence of in will not affect the convergence. This convergence indicates that with the proposed experimental parameters, the transmission probability of a single photon is likely to have to be improved to above a threshold value, or the quantum advantage may never be demonstrated by boson sampling.
References
- [1]
Scott Aaronson and Alex Arkhipov.
The computational complexity of linear optics.
In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 333–342. ACM, 2011.
- [2]
Alex Neville, Chris Sparrow, Raphaël Clifford, Eric Johnston, Patrick M. Birchall, Ashley Montanaro, and Anthony Laing.
Classical boson sampling algorithms with superior performance to near-term experiments.
Nature Physics, 13(12):1153–1157, 2017.
- [3]
Junjie Wu, Yong Liu, Baida Zhang, Xianmin Jin, Yang Wang, Huiquan Wang, and Xuejun Yang.
A benchmark test of boson sampling on Tianhe-2 supercomputer.
National Science Review, 5(5):715–720, 2018.
- [4]
J. Durbin and G. S. Watson.
Testing for serial correlation in least squares regression. iii.
Biometrika, 58(1):1–19, 1971.
- [5]
Jun S. Liu.
Metropolized independent sampling with comparisons to rejection sampling and importance sampling.
Statistics and Computing, 6(2):113–119, 1996.
- [6]
Paul A. Gagniuc.
Markov Chains: From Theory to Implementation and Experimentation, First Edition.
John Wiley & Sons, Ltd, 2017.
- [7]
Mark Jerrum, Alistair Sinclair, and Eric Vigoda.
A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries.
JOURNAL OF THE ACM, 51(4):671–697, 2004.
- [8]
Hui Wang, Jian Qin, Xing Ding, Ming-Cheng Chen, Si Chen, Xiang You, Yu-Ming He, Xiao Jiang, L. You, Z. Wang, C. Schneider, Jelmer J. Renema, Sven Höfling, Chao-Yang Lu, and Jian-Wei Pan.
Boson sampling with 20 input photons and a 60-mode interferometer in a -dimensional hilbert space.
Phys. Rev. Lett., 123:250503, Dec 2019.
- [9]
Justin B. Spring, Benjamin J. Metcalf, Peter C. Humphreys, W. Steven Kolthammer, Xian-Min Jin, Marco Barbieri, Animesh Datta, Nicholas Thomas-Peter, Nathan K. Langford, Dmytro Kundys, James C. Gates, Brian J. Smith, Peter G. R. Smith, and Ian A. Walmsley.
Boson sampling on a photonic chip.
Science, 339(6121):798–801, 2013.
- [10]
Matthew A. Broome, Alessandro Fedrizzi, Saleh Rahimi-Keshari, Justin Dove, Scott Aaronson, Timothy C. Ralph, and Andrew G. White.
Photonic boson sampling in a tunable circuit.
Science, 339(6121):794–798, 2013.
- [11]
Max Tillmann, Borivoje Dakic, René Heilmann, Stefan Nolte, Alexander Szameit, and Philip Walther.
Experimental boson sampling.
Nature Photonics, 7(7):540–544, 2013.
- [12]
Nicolò Spagnolo, Chiara Vitelli, Marco Bentivegna, Daniel J. Brod, Andrea Crespi, Fulvio Flamini, Sandro Giacomini, Giorgio Milani, Roberta Ramponi, Paolo Mataloni, Roberto Osellame, Ernesto F. Galvão, and Fabio Sciarrino.
Experimental validation of photonic boson sampling.
Nature Photonics, 8(8):615–620, 2014.
- [13]
Marco Bentivegna, Nicolò Spagnolo, Chiara Vitelli, Fulvio Flamini, Niko Viggianiello, Ludovico Latmiral, Paolo Mataloni, Daniel J. Brod, Ernesto F. Galvão, Andrea Crespi, Roberta Ramponi, Roberto Osellame, and Fabio Sciarrino.
Experimental scattershot boson sampling.
Science Advances, 1(3), 2015.
- [14]
Han-Sen Zhong, Yuan Li, Wei Li, Li-Chao Peng, Zu-En Su, Yi Hu, Yu-Ming He, Xing Ding, Weijun Zhang, Hao Li, Lu Zhang, Zhen Wang, Lixing You, Xi-Lin Wang, Xiao Jiang, Li Li, Yu-Ao Chen, Nai-Le Liu, Chao-Yang Lu, and Jian-Wei Pan.
12-photon entanglement and scalable scattershot boson sampling with optimal entangled-photon pairs from parametric down-conversion.
Phys. Rev. Lett., 121:250505, Dec 2018.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] John Preskill. Quantum computing and the entanglement frontier. ar Xiv , (1203.5813), 2012.
- 2[2] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando G. S. L. Brandao, David A. Buell, Brian Burkett, Yu Chen, Zijun Chen, Ben Chiaro, Roberto Collins, William Courtney, Andrew Dunsworth, Edward Farhi, Brooks Foxen, Austin Fowler, Craig Gidney, Marissa Giustina, Rob Graff, Keith Guerin, Steve Habegger, Matthew P. Harrigan, Michael J. Hartmann, Alan Ho, Markus Hoffmann, Trent Huang, Travis S. Humble, Sergei V. Isakov, Evan
- 3[3] Nicoló Spagnolo and Fabio Sciarrino. The race for quantum supremacy: pushing the classical limit for photonic hardware. National Science Review , 6(1):2–3, 11 2018.
- 4[4] Michael A. Nielsen and Isaac Chuang. Quantum computation and quantum information. American Journal of Physics , 70(5):558–559, 2002.
- 5[5] Scott Aaronson and Alex Arkhipov. The computational complexity of linear optics. In Proceedings of the forty-third annual ACM symposium on Theory of computing , pages 333–342. ACM, 2011.
- 6[6] Stefan Scheel and Stefan Yoshi Buhmann. 1 - 133 macroscopic quantum electrodynamics - concepts and applications, 2008.
- 7[7] L. G. Valiant. The complexity of computing the permanent. Theoretical Computer Science , 8(2):189 – 201, 1979.
- 8[8] Justin B. Spring, Benjamin J. Metcalf, Peter C. Humphreys, W. Steven Kolthammer, Xian-Min Jin, Marco Barbieri, Animesh Datta, Nicholas Thomas-Peter, Nathan K. Langford, Dmytro Kundys, James C. Gates, Brian J. Smith, Peter G. R. Smith, and Ian A. Walmsley. Boson sampling on a photonic chip. Science , 339(6121):798–801, 2013.
