Classical simulation of boson sampling with sparse output
Wojciech Roga, Masahiro Takeoka

TL;DR
This paper presents efficient classical methods for simulating sparse boson sampling outputs by leveraging known marginal distributions, enabling feasible recovery of joint distributions in certain cases.
Contribution
It introduces classical algorithms for simulating sparse boson sampling outputs, exploiting sparsity and marginal distributions, with extensions involving quantum annealing.
Findings
Efficient classical recovery of joint distributions from sparse boson sampling data.
Sparsity enables classical simulation where direct computation is complex.
Extensions include quantum annealing approaches.
Abstract
Boson sampling can simulate physical problems for which classical simulations are inefficient. However, not all problems simulated by boson sampling are classically intractable. We consider a situation in which it is known that the outcome from boson sampling is sparse. It can be determined from a few marginal distributions which are classically calculable. Still, recovering of the joint distribution can be of high complexity. We show classically efficient methods of the recovery assuming high sparsity of the joint distribution. Various extensions are discussed including a version involving quantum annealing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Classical simulation of boson sampling with sparse output
Wojciech Roga
Masahiro Takeoka
National Institute of Information and Communications Technology, Koganei, Tokyo 184-8795, Japan
E-mail: [email protected]
Abstract
Boson sampling can simulate physical problems for which classical simulations are inefficient. However, not all problems simulated by boson sampling are classically intractable. We show explicit methods of classically simulating boson sampling when its outcome is known to be highly sparse. In the methods, we first determine a few marginal distributions and then recover the joint distribution from them. Although the latter could be of high complexity in general, we show that it can be classically calculable when the high sparsity assumption holds. Various extensions are discussed including a version involving quantum annealing.
Background and motivation.– Simulating complicated quantum systems on classical or quantum simulators is an interesting problem with the industrial impact. It is simply cheaper to test many, e.g., molecular configurations in the simulators than synthetizing the molecules and testing their crucial properties experimentally. However, some problems are believed to be of complexity for which classical computers are inefficient. For instance, Huh et al. Huh ; Huh2 showed that the statistics of Franck-Condon (FC) factors Franck ; Condon for vibronic transitions in large molecules Doktorov ; Jankowiak is equivalent to the statistics of samples in a version of boson sampling Aronson ; Spring ; Tillman ; Shchesnovich ; Tichy ; Motes ; Gard - the Gaussian boson sampling Lund ; Hamilton ; Quesada . Although, it is widely accepted that boson sampling from interferometers described by the average-case Haar-random unitary transformations or Gaussian-random matrices is classically computationally inefficient Aronson , it is not clear if particular problems of quantum chemistry belong to this class, as the related matrices are not typically Haar- or Gaussian-random Huh ; Cao18 . In particular, if these matrices were of low rank or consisted of non-negative numbers there exist efficient algorithms for approximating their permanents Barvinok ; Jerrum . This reduces complexity of scattershot boson sampling.
In this paper we discuss the case when we a priori know that the statistics of outputs from boson sampling is sparse. This knowledge can be based on experience with similar problems, symmetries or other physical properties. We analyze examples of relevant physical systems with approximately sparse vibronic-spectra that can be simulated on a Gaussian boson sampling-type simulator in Jacob . There, sparsity of a spectrum is expected based on shapes of spectra of typical molecules and the intuition that the most significant transitions in 0K are likely these with only a few phonons involved in just a few modes. In the present paper, we introduce methods of at least approximate classical computation of the sparse statistics. We use the fact that for boson sampling marginal distributions for small number of modes can be efficiently calculated. Then we apply appropriately modified compressive sensing methods to efficiently recover the joint statistics. Up to our knownledge this is the first result dealing with classical simulability of boson sampling with sparse output. Moreover, we develop original methods to improve the efficiency of the classical algorithms for compressive sensing reconstruction, namely the so-called polynomial time matching pursuit (PTMP) that can be extended to other methods, for instance, gradient pursuit Blumensath08 . Finally, our novel approach has an impact on interdisciplinary studies on molecular vibronic spectroscopy Jacob , compressive sensing, and quantum computing with hybrid devices.
Let us summarize our arguments. For scattershot boson sampling computability of marginal distributions for small number of modes is implied by the result of Gurvits cited together with the proof in the Aaronson and Arkhipov paper Aronson as follows:
**Theorem (Gurvits’s k-Photon Marginal Algorithm)**There exists a deterministic classical algorithm that, given a unitary matrix , indices , and occupation numbers , computes the joint probability
[TABLE]
*in time.
Here, means that the occupation numbers are sampled according to the probability distribution over possible outputs of . If is small, as we assume in this paper, calculating marginal distributions is efficient. The counterpart of this theorem for Gaussian boson sampling is discussed in the conclusions. In our approach we use the compressive sensing methods Donoho ; Candes1 ; Baraniuk ; Candes2 ; Foucard ; Draganic ; Candes3 to recover the joint sparse distribution from marginal ones. We show how to do that efficiently. Our arguments are inspired by works from the field of quantum compressive sensing Cramer ; Gross ; Liu ; Flammia ; Shabani ; Shabani2 or similar ones that consider recovering full information about states of high dimensional systems from states of low-dimensional subsystems. In particular, Cramer et al. Cramer proposed an algorithm to reconstruct a low rank quantum state from its reduced density matrices known from quantum tomography of subsystems. They used a modified singular value thresholding algorithm (MSVT). The essential part of it is finding the ground states of matrices that can be thought as Hamiltonians. This operation itself is inefficient, but if the Hamiltonian is constructed from local nearest-neighbor interactions the best so-called matrix product state approximations of the ground states can be found efficiently. Then MSVT converges to the right solution in time polynomial in the size of the system. Additionally, the application of the matrix product states representation allows for significant reduction of the required memory. The procedure we provide is similar to this idea. We show that the complexity bottleneck of a compressive sensing reconstruction of a probability vector from marginal distributions is in localizing the largest element from a long list (counterpart of finding the ground state). This procedure typically would require the number of steps and bits of memory which is , where is the length of the list. However, we show that in our case the list may be written as the local nearest-neighbor interaction Hamiltionian of a classical spin chain. Localization of the maximum energy configuration is efficient in this problem both in terms of the number of operations and memory Ising ; Schuch10 .
Our work is related to a general problem whether sparsity of the output of a quantum computer guarantees classical simulability. In Schwarz ; Pashayam there was shown that the existence of an efficient classical randomized algorithm for entries of all marginal distributions and the sparsity assumption indeed guarantee this for the circuit-based computing device. However, this result is not automatically applicable to boson sampling. Moreover, in our approach we relax one of the assumptions considering only computability of some marginal distributions for limited number of modes which extends the class of known classically tractable problems consistent with the boson sampling architecture.
Marginal distributions.— Let us start considering the following scenario. For some unitary transformation which reflects features of a physical system we want to simulate the transition probabilities from a given -mode initial state to all occupation numbers states (the left hand side of fig. 1) that may be recorded by simultaneous readouts of phtoton resolving detectors. We assume that there can be possible photons in each output mode. So, the searched vector is of length . Here, we allow for the total photon number not being preserved. In this way we can consider the situation with losses or Gaussian boson sampling using the same formalism.
We claim that if is sparse, i.e., it contains at most non-zero entries, where , it is possible to efficiently find calculating the statistics of marginal distributions from smaller number of detectors measuring only chosen modes and ignoring the rest (the right hand part of fig. 1).
To analyze the problem in detail let us introduce the following notation. Vector can be decomposed in the basis of measured sets of occupation numbers as follows
[TABLE]
Here, the numbers are non-negative, sum up to one and only of them are non-zero. We will use the following convention
[TABLE]
where the lower line indicates the positions of 1 in each vector component of the tensor product and there is appropriate number of 0s in place of dots. In this explanation we will use a notation for two-detector simultaneous readouts, however the formalism can be extended to simultaneous readouts of a different number of detectors if necessary. The measurement of modes and leads to the marginal probability distribution
[TABLE]
where is the sum of all entries with fixed and . We get this distribution observing frequencies of outcomes from given modes independently of what happens in the remaining modes. In the chosen convention the rows of the so-called measurement matrix are binary patterns as follows
[TABLE]
where means the entry-wise multiplication and
[TABLE]
Here and are dimensional vectors. The entry-wise multiplication preserves the tensor product structure and can be executed as the entrywise multiplication in each part of the tensor product.
Probabilities of different simultaneous readouts form a -dim vector
[TABLE]
where is a binary matrix. If is sparse in the basis incoherent with the rows of the measurement matrix it can be determined based on the number of measurements as the most sparse solution of . This can be seen as the constraint norm minimization problem. It is solvable by many known algorithms used in compressive sensing Draganic . In the next part we analyze the complexity of particular algorithms adapted to minimize the computational costs and memory requirements.
Complexity analysis.– Assume that we are focused on simultaneous readouts of different photon numbers in neighboring modes. The corresponding marginal distributions can be calculated in polynomial time in the number of modes. Indeed, Gurvits’s Algorithm from Aronson calculates of coincidences in all neighboring modes in steps. The dimensionality of the sparse vector is . We allow for sub-linear scaling of the number of non-zero entries . Therefore, we keep and well separated. Moreover, we assume that problems of complexity are efficiently tractable. From the readouts for marginal distributions, we want to recover the most sparse joint distribution knowing the measurement matrix. So, our goal is to solve the underdetermined problem , where is a -dimensional measurement vector of marginal distributions, is a -dimensional sparse vector which is searched. In our case, rows of are well structured patterns. This implies that it is easy to multiply by any sparse vector. For instance, assume that we know that an entry of a vector is non-zero. We decompose as a N-inary number which gives us immediately its tensor product representation as in (2) consistent with the structure of . For example, assuming that , position is represented as (corresponding to binary 13 as 0 occupies the first position) which corresponds to
[TABLE]
This vector has just one non-zero element in the 14th entry. It is immediate to show what is its overlap with a particular row of which, for instance, can be of the form
[TABLE]
We need to check only relevant modes.
To solve the underdetermined problem we discuss in detail two first-order greedy algorithms. We consider the matching pursuit mallat method which is simple but in general less accurate, and the gradient pursuit which can be more accurate and faster but slightly more computationally demanding Blumensath08 . These two algorithms are enough to discuss the bottleneck for the memory and computational costs, and to show how to overcome these problems. In particular, our modification of the matching pursuit leads to a new algorithm which we call the polynomial time matching pursuit (PTMP).
The standard matching pursuit protocol finds the -sparse solution. It is summarized as follows: I. (Initialization) At step [math]: the residual and the approximate solution is . II. (Support detection) In step recognize the column of denoted by which is the most similar to the current residue by solving . III. (Updating) Update the solution only in index i.e., and update the residue using -th column of as follows . IV. Continue iterating until a stopping criterion is matched.
Let us notice that the first and the third part of the algorithm can strongly benefit from the sparsity of vectors involved. Moreover, for any , vector can be found operationally (multiplication of and a sparse vector). So, does not need to be stored beforehand and the memory and computational costs of these parts are - size of . The entire procedure can be iterated until a given sparsity of the solution is achieved.
Finally, let us consider the operational costs of the support detection part. Without exploiting the structure of we require bits of memory to store the matrix and the same for the operational costs for multiplication . Moreover, without smart tricks, typically, we would need at least steps to find the maximum value from the list . The same amount of memory is needed to store the list. However, considering specific features of the problem we can overcome the bottleneck. As for finding the index of the largest element of a list, we notice that it is equivalent to finding the leading eigenvector of the diagonal matrix (Hamiltonian) with the list on the diagonal. We know that for at least local Hamiltonians there are computationally efficient methods for finding the eigenvectors. Let us notice that in our problem in the support detection part can be written exactly as a diagonal local Hamiltonian. To simplify the explanation let us consider first an example with the readouts from a single detector only. A row of measurement matrix corresponding to photons in mode is given as in (5). Observing all from only mode we have
[TABLE]
where is a submatrix of corresponding to different photon numbers recorded in mode . Here, is part of the residual vector that corresponds to rows of . Measurements of other modes have also form of the local Hamiltonians if understood as diagonal matrices. So, the same holds for . For simultaneous readouts of neighboring modes we have
[TABLE]
which written in the diagonal form is a part of a typical nearest neighbor local Hamiltonian for a one dimensional spin chain. It is clear from this notation that the matrix vector multiplication requires negligible operational costs and bits of memory to store long vector in its compressed representation as a sum of local matrices.
Using these observations, the support finding from the matching pursuit algorithm can be determined much faster than in time. If we consider just two neighboring modes simultaneous readouts our problem can be described in terms of the classical spin chain formalism. We can use an explicit strategy from, e.g., Schuch10 to find the optimal configuration of the classical spin chain which is equivalent to finding the position of the maximal value in our list. Indeed, from the algorithm Schuch10 is equivalent to . This procedure reduces the computational costs of the support detection to (factor 2 is from the necessity to repeat the procedure for as we are looking for the largest element in the absolute value). The matching pursuit with the modification increasing its efficiency is called the polynomial time matching pursuit (PTMP). We use this method to reconstruct vibronic spectra of some molecules from their marginal distributions in Jacob .
PTMP although computationally simple is not the fastest one as the same support index may need to be used many times. Also it does not guarantee the accurate solution, as the solution is expressed based on relatively small number of columns of in which is decomposed. Some modifications of the method can lead faster to more accurate results. Among them there is an orthogonal matching pursuit algorithm (OMP) zhang in which in each step the support is updated and kept in the memory. The solution is approximated by the vector of coefficients in the best 2-norm approximation of the measurement vector in terms of the selected columns of . Another algorithm, gradient pursuit (GP) Blumensath08 updates the solution by correcting it in the direction of the largest gradient of the 2-norm mentioned above by a given step. As discussed in Blumensath08 the only additional cost over what we have in the matching pursuit is the cost of calculating the step size. This can be however done efficiently in PTMP due to an easy way of finding the product of a sparse vector and matrix . The convergence rates depend on contracting properties of and a chosen algorithm.
Accuracy of the reconstruction.– For successful compressive sensing reconstruction the so-called coherence between the rows of a measurement matrix and the sparsity basis must by low. Roughly speaking it means that a particular measurement is sensitive to changes in a large part of the measured vector. For this reason random unstructured matrices are of particular interest as they are incoherent with almost any basis. However, random matrices are problematic for large scale tasks. They are expensive in terms of storage and operational costs as the inefficient matrix vector multiplication is usually needed Cevher . Therefore, structured matrices which can be defined operationally and do not need to be stored are also desired. In our case, measurement matrix is structured and of low coherence with respect to the measured space basis. However, the usefulness of a specific matrix depends as well on the reconstruction algorithm. As we do not have theoretical predictions regarding convergence of considered protocols with the measurement matrix, we tested the performance of the GP algorithm with numerically. We have measured 1000 randomly chosen distributions with non-zero entries for the problem with output modes and different events measured in each mode. Matrix was associated with all 2-neighboring modes coincidence measurements. So, is a matrix. We have tested how often is larger than a given threshold, where and are the randomly chosen measured signal used in the simulation and the reconstructed signal respectively. We iterated the algorithm not more than 50 times. For we observe that in about cases and in cases . For , we have and in and of situations respectively. Finally, for we observe and in and of cases respectively. As there are many possible more complicated and more accurate algorithms which can still share the feature of the reduced complexity we predict that these numbers can be improved. However, testing them further is out of the scope of this paper. How PTMP works in practice is analyzed in a separate paper Jacob . There, we observe that the distribution can be only approximately sparse fort the algorithm to reconstruct the most significant picks.
Conclusions.– In this paper we show that if we a priori know that boson sampling samples according to a sparse distribution and the sparsity is high enough we can calculate the first-order approximation of this distribution efficiently on a classical computer. The crucial steps are to calculate the marginal distributions using the Gurvits’s k-Photon Marginal Algorithm and then to use an algorithm for compressively sensed sparse signal recovery. Many of these algorithms quickly converge assuming that the support localization, as discussed in this paper, is efficient. We show that due to the specific form of the marginal measurements the problem can be reduced to finding the optimal configuration in a 1 dimensional classical spin chain with local interactions. The latter is known to be efficient.
For Gaussian boson sampling the only difference is in computing the marginal distributions. In this case evolving an input states through the interferometer and finding partial states for given modes is easy Barlett . The statistics of photons of these states requires computing loop Hafnians Hamilton ; Quesada ; Kruse ; Quesada2 of appropriate matrices which for two mode Gaussian states are still classically tractable. In consequence, we can use the relation between Franck-Condon factors and the Gaussian boson sampling Huh and apply the algorithm described in this paper to find Franck-Condon factors under the condition that their distribution is sparse. This new approach is tested in Jacob .
In this paper we have investigated the situation with only nearest neighbor modes measurements. This restricts the tolerable sparsity for the method. When the sparsity decreases ( increases) a larger amount of data is needed. Implementing three or more nearest neighbors modes measurements is one of the solutions. We could think as well about relaxing the nearest modes requirement and investigating the situation with non-local coincidence distributions. Then more advanced techniques from the theory of spin glass could be applied, see e.g., Rams ; Jalowiecki . Finally, our function (10) to be minimized when restricted to pairs of binary modes is equivalent to the Hamiltonian , where
[TABLE]
Here is the Pauli matrix in mode . The coefficients can be chosen to correspond to from (10). The ground state of Hamiltonian can be found efficiently by simulated or quantum annealing under some conditions about spectrum of and the correspondence of to the connections in the annealer Kirkpatrick ; Kadowaki ; Lucas . In our approach we have some freedom in choosing pairs of modes to guarantee that these conditions are satisfied. So, the simulated or quantum annealing could be used to extend our approach.
Our technique can be applied to marginal distributions measured in realistic experiments or calculated assuming uniform losses Garcia ; Oszmaniec . The joint distribution after losses may be interpreted as a lossless distribution multiplied by a stochastic matrix describing the process of losses, where is in the form of a tensor product. The measurement matrix from our technique is now the product of defined as previously and . The new measurement matrix inherits features allowing for keeping the complexity tractable. The impact of losses may be a direction of further studies.
Acknowledgements.
We would like to thank Nicolás Quesada, Raúl García-Patrón and Jonathan Dowling for constructive comments. This work was supported by JST CREST Grant No. JPMJCR1772.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) J. Huh, G. G. Guerreschi, B. Peropadre, J. R. Mc Clean, and A. Aspuru-Guzik, Boson sampling for molecular vibronic spectra, Nature Photonics 9 615 (2015).
- 2(2) J. Huh, and M.-H. Yung, Vibronic Boson Sampling: Generalized Gaussian Boson Sampling for Molecular Vibronic Spectra at Finite Temperature, Sci. Rep. 7 , 1 (2017).
- 3(3) J. Franck, Elementary processes of photochemical reactions, Trans. Faraday Soc. 21 , 536 (1925).
- 4(4) E. U. Condon, Nuclear Motions Associated with Electron Transitions in Diatomic Molecules, Phys. Rev. 54 , 858 (1928).
- 5(5) E. V. Doktorov, I. A. Malkin, and V. I. Manko, Dynamical symmetry of vibronic transitions in polyatomic-molecules and Franck-Condon principle J. Mol. Spectrosc. 64 , 302 (1977).
- 6(6) H.-C. Jankowiak, J. L. Stuber, and R. Berger, Vibronic transitions in large molecular systems: Rigorous prescreening conditions for Franck-Condon factors, J. Chem. Phys. 127 , 234101 (2007).
- 7(7) S. Aaronson, and A. Arkhipov, The Computational Complexity of Linear Optics, In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing (ACM, New York, 2011), 333 (2011).
- 8(8) J. B. Spring, et al. Boson Sampling on a Photonic Chip Science 339 798 (2013).
