TL;DR
This paper presents a deep learning-based approach to model and control reconfigurable photonic circuits, overcoming challenges of indirect measurement and uncertainties, with applications in classical and quantum control tasks.
Contribution
It introduces a graybox neural network model combining Hamiltonian estimation and quantum mechanics rules for controlling optical quantum devices.
Findings
Accurately models the photonic circuit with low mean square error.
Successfully controls classical output power distribution.
Achieves target quantum gates using neural network-derived control voltages.
Abstract
The complexity of experimental quantum information processing devices is increasing rapidly, requiring new approaches to control them. In this paper, we address the problems of practically modeling and controlling an integrated optical waveguide array chip, a technology expected to have many applications in telecommunications and optical quantum information processing. This photonic circuit can be electrically reconfigured, but only the output optical signal can be monitored. As a result, the conventional control methods cannot be naively applied. Characterizing such a chip is challenging for three reasons. First, there are uncertainties associated with the Hamiltonian describing the chip. Second, we expect distortions of the control voltages caused by the chip's electrical response, which cannot be directly observed. Finally, there are imperfections in the measurements caused by losses…
| Symbol | Expression | Description |
|---|---|---|
| Identity (100% decoupling between waveguides) | ||
| Perfect Transfer between waveguide 1 and waveguide 3 | ||
| 50-50 Power split between waveguide 1 and waveguide 3 (Hadamard gate) | ||
| Perfect transfer between waveguide 1 and waveguide 2 | ||
| Phase shift of between waveguide 1 and waveguide 3 | ||
| Rotation about X-axis by angle between waveguide 1 and waveguide 3 | ||
| Rotation about Z-axis by angle between waveguide 1 and waveguide 3 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Modeling and Control of a Reconfigurable Photonic Circuit using Deep Learning
Akram Youssry
University of Technology Sydney, Centre for Quantum Software and Information, Ultimo NSW 2007, Australia
Department of Electronics and Communication Engineering, Faculty of Engineering, Ain Shams University, Cairo, Egypt
Robert J. Chapman
Quantum Photonics Laboratory and Centre for Quantum Computation and Communication Technology, School of Engineering, RMIT University, Melbourne, Victoria 3000, Australia
Institut für Experimentalphysik, Universität Innsbruck, Technikerstraße 25, 6020 Innsbruck, Austria
Alberto Peruzzo
Quantum Photonics Laboratory and Centre for Quantum Computation and Communication Technology, School of Engineering, RMIT University, Melbourne, Victoria 3000, Australia
Christopher Ferrie
University of Technology Sydney, Centre for Quantum Software and Information, Ultimo NSW 2007, Australia
Marco Tomamichel
University of Technology Sydney, Centre for Quantum Software and Information, Ultimo NSW 2007, Australia
(March 17, 2024)
Abstract
The complexity of experimental quantum information processing devices is increasing rapidly, requiring new approaches to control them. In this paper, we address the problems of practically modeling and controlling an integrated optical waveguide array chip—a technology expected to have many applications in telecommunications and optical quantum information processing. This photonic circuit can be electrically reconfigured, but only the output optical signal can be monitored. As a result, the conventional control methods cannot be naively applied. Characterizing such a chip is challenging for three reasons. First, there are uncertainties associated with the Hamiltonian model describing the chip. Second, we expect distortions of the control voltages caused by the chip’s electrical response, which cannot be directly observed. And third, there are imperfections in the measurements caused by losses from coupling the chip externally to optical fibers. We have developed a deep neural network approach to solve these problems. The architecture is designed specifically to overcome the aforementioned challenges using a Gated Recurrent Unit (GRU)-based network as the central component. The Hamiltonian is estimated as a blackbox, while the rules of quantum mechanics such as state evolution is embedded in the structure as a whitebox. The resulting overall graybox model of the chip shows good performance both quantitatively in terms of the mean square error and qualitatively in terms of the shape of the predicted waveforms. We use this neural network to solve a classical and a quantum control problem. In the classical application we find a control sequence to approximately realize a time-dependent output power distribution. For the quantum application we obtain the control voltages to realize a target set of quantum gates. The method we propose is generic and can be applied to other systems that can only be probed indirectly.
Contents
I Introduction
The complexity of experimental quantum information processing devices is increasing rapidly, requiring new approaches to control them. Noisy Intermediate-Scale Quantum (NISQ) devices are emerging nowadays, with lots of experimental challenges Pre (18); FMM*+* (18); AAB*+* (19). In this present work, we deal with the problem of modeling a device that can process some input signals to generate output signals, and the operation of the device can be manipulated using control signals. There are three possible methods to model such a device presented as follows.
The first approach is through direct physical modeling. We look for a mathematical description of the output signals expressed in terms of the input and control signals. The equations will involve some unknown parameters which should be chosen to match the performance of an actual realization of the device. And thus we perform measurements on the device and use methods of parameter estimation in order to find the unknown parameters of the model. We call this approach a whitebox approach. This would be the first approach one would try to use. The problem however is that if there are uncertainties in the relations between some variables, or some assumptions are made to derive some formulas (which might not be true for an actual device), then the resulting model might not be accurate enough to fit and predict actual measurements. Imperfections in the measurement process will also decrease the accuracy of the obtained model. Additionally, relations between some variables may be completely unknown and thus the problem becomes not just estimating parameters but also estimating functional forms (maps between variables). Moreover, there are situations where estimating the unknown parameters requires measurements that are not experimentally possible. For instance, if we want to estimate the parameters of a transfer function of an electrical circuit, then we will need to measure voltages at some nodes of the circuit. However, if we cannot physically access those nodes then the problem becomes more difficult. Finally, the complexity of the problem increases if the physical models involve non-linear relations. Thus, the whitebox approach might face lots of challenges in practical situations.
The second way to solve the problem is through the blackbox approach. We do not obtain a set of physical equations describing the device, but rather we construct a generic function that approximates the relationship between the output and the input and control signals. This is usually a highly non-linear function with a large number of parameters that can be estimated using the measurements. If the function is complex enough, then it can model and predict any unknown relations between variables. For that type of modeling, machine learning structures, such as artificial neural networks, are very suitable. This approach has an advantage of being capable of predicting the output signals given the input and control signals. However, there are few drawbacks. First, the resulting model provides the least amount of information about the physics of how the device works. And so it would be difficult to use the model for re-engineering the device if required. Second, the resulting accuracy may not be as high as expected. This is because the structure does not have any prior information about the map between inputs and outputs, and so it might need to “discover” some complicated laws of physics (such as the evolution of quantum systems) beside other unknown relations. This makes the training process harder. Consequently, a larger dataset and higher number of iterations would be needed to reach a good level of accuracy, which might turn out to be impractical.
The last approach is a combination of the other two approaches, we would refer to as a graybox model. In this case, we use direct physical modeling for parts of the description that we have complete certainty about (whitebox part), while we use a blackbox for the other parts that we are uncertain about. The model should be built such that the measurements required for the learning process are physically available; there is enough physical modeling through the whiteboxes to allow extracting useful information about the behaviour of the device; and any measurement imperfections should be accounted for. Machine learning structures are also suitable for this type of modeling. Standard machine learning layers would be used for the blackboxes. However, we also need to define non-standard layers to account for the whiteboxes, and these are application specific. The overall structure should be consistent to allow standard learning algorithms to work. In this paper, we explore the use of hybrid deep learning architecture to solve problems related to experimental modeling and control of quantum systems. Although the focus is on a photonic device that will be introduced shortly, the graybox approach can be considered very general, applying to many situations where there is a system that cannot be probed arbitrarily as discussed.
An example of this approach is when we have a quantum device described by a quantum system. The input signal is modeled by the initial quantum state. The output signal is modeled by a measurement performed on the system after evolving according to a given Hamiltonian. The control signals would then be some external forces applied to the system, and they are modeled by some terms in the Hamiltonian. Using the laws of quantum mechanics we could write down the relation between the input, output, and control. This would correspond to the whitebox part of the model. Some of the terms inside the Hamiltonian might be unknown, so we would use a blackbox to evaluate these terms. The resulting overall model is then a graybox model. This is a useful approach because it still gives an insight on the physics of the device and one can evaluate physically significant quantities. Additionally, the terms that reduce the accuracy of the models, due to inaccurate physical modeling, are now replaced by blackboxes, resulting in a more accurate overall model.
In this paper, we focus on a particular system, currently being developed by some of the authors, which is an array of nearest neighbor coupled waveguides with a reconfigurable Hamiltonian. Characterizing such a chip is a significant challenge as will be discussed later. The device we consider is an array of nearest neighbor coupled waveguides that implements a continuous time quantum walk on photons propagating along the array ADZ (93); PLM*+* (10). In all previous work, static quantum walks were studied with fixed coupling parameters. Here, we demonstrate a reconfigurable waveguide array by exploiting the electro-optic control of Lithium Niobate. The waveguides are fabricated by reverse proton exchange and we apply local electric fields to change the properties of the coupled array. Figure 1 shows the schematic of the chip. We inject laser light into one input waveguide of the array and measure the output optical power distribution across all the waveguides. The electrodes can be controlled to alter the output distribution.
Numerical simulations of such a device shows a host of potential applications. The chip can operate as a classical device with possible applications in telecommunications such implementing a Mach-Zehnder interferometer or an electro-optic modulator. Being able to characterize and control such a device is important and has a strong economic impact, but at the same time is very challenging as will be discussed later. Additionally, the chip can work as a quantum device. This includes operating as a quantum router, where single photons can be directed to propagate and be detected at one of the output ports by dynamically changing the control voltages. It can also be used to generate complicated quantum states (such as W state), and realize different quantum gates on single or multiple photons.
We focus on two different scenarios of using this device. The first is when experimentally we are only measuring powers at the output of the chip. This is also equivalent to a single-photon experiment where we only detect photons at the outputs. In this situation, we can use classical modeling only. However, we use a quantum model for two reasons. First, to show the applicability of the graybox approach when the whitebox parts are quantum. The second reason is that although we only measure powers at the output, we allow for arbitrary states in the input, including entangled states which cannot be described by a pure classical model. Also, if there are multiple input photons at different waveguides, then a quantum mechanical description of the chip is required to describe the correlations of the photons at the output PLM*+* (10). The second application is when we can also measure phases through Mach-Zehnder type of interferometry. In this case, we show the possibility of implementing single-qubit quantum gates with high fidelity, where the qubits live in the subspace formed of the two far-end waveguides. The proposed framework allows finding the set of control voltages required to obtain a target sequence of quantum gates, given the different challenges faced during the characterization process.
Machine learning has been a very active area of research recently, with focus on both the algorithms as well as the wide range of applications touching every field of science and beyond. Deep learning has particularly gained attention as it becomes more and more feasible. This is due to today’s enormous computational power, as well as the availability of big datasets for training. The survey Den (14) covers the common architectures used in deep learning and the range of possible applications.
The physics community is also currently exploring the use machine learning to solve some practical problems faced in designing, controlling, and automating experiments. Some examples of recent work include the automated design of quantum optical setups KMF*+* (16) using reinforcement learning MNK*+* (18), and using deep learning and genetic algorithms ONK (18). Machine learning was also used in ZZW*+* (19) to configure an optical signal processor, which itself can work as an artificial neural network with linear activation functions. Deep learning was also used in Ref. MLBZ (18) to discover and characterize topological phases of matter and phase transitions. Techniques of both deep learning and reinforcement learning have been applied in quantum control NBSN (19); BDS*+* (18); OMBS (19). These works differ from ours by treating the entire learned model, including quantum dynamics, as a blackbox, with no detailed modeling of an experimental realization.
Methods of machine learning have also been used in other areas of quantum information. For example, the work presented in ACH*+* (18); YFT (19) is about developing online quantum state estimation algorithms inspired by the matrix exponentiated gradient method, a technique used in classical machine learning. Other applications include the use of neural networks in quantum cryptography Nie (19) and in quantum error correction BOTB (18). Another related problem to what we present here is Hamiltonian learning GFWC (12); WGFC14b ; WGFC14a ; WPS*+* (17). This is a Bayesian framework that allows updating the priors on Hamiltonian estimates given observed measurements. This approach although very useful on its own, is not suitable for the problem under consideration. The reason is that we are interested in estimating the map between control voltages and the Hamiltonian through indirect measurements. The Bayesian approach is suitable if the Hamiltonian is fixed (the control voltages are fixed). Another Bayesian approach is presented in LMC*+* (19) where the focus is on the real-time prediction of the set of optimal measurements to perform on a quantum dot, using partial information available so far. This allows efficient characterization of the device.
The structure of the remainder of the paper is as follows. The paper starts with an overview on the quantum-mechanical description of the chip in Section II.1, followed by the experimental constraints and challenges in Section II.2. Next, in Section III we present the proposed deep learning architecture in detail. After that, we present the numerical results of the simulations and discuss their significance in Section IV. Finally, we end with the conclusion and discuss the possible future extensions of this work in Section V. Appendix A contains figures related to Section IV placed there for maintaining the readability and continuity of the text.
II Problem Setup
This section starts with describing quantum mechanically the photonic circuit we are trying to model and control, followed by the challenges we face in characterizing it experimentally.
II.1 Chip model
The chip with -waveguides can be described quantum mechanically in Hilbert space, with the computational basis encoding the presence of photons in each waveguide. For example for the state encodes a photon present at the first waveguide, the state encodes a photon in the second waveguide and so on. The evolution of the system represents the behavior of the chip when light propagates along the waveguides. So, the initial state of the system represents the mode distribution at the inputs of the waveguides, while the final state represents the distribution at the output of the waveguides. For example, if the system evolves from the the state to the state , then this means that we started with injecting a photon at the first waveguide (at one end of the chip), and the photon got perfectly transfered to the second waveguide after propagating along the chip until the output. This evolution can be described by the unitary
[TABLE]
where is the length of the chip, and is the Hamiltonian of the chip. In general, we can write the Hamiltonian in the form
[TABLE]
where is the zero-voltage Hamiltonian, and is the interaction Hamiltonian which is a function of the voltages applied on the electrodes. Note that the control voltages are time-dependent, however, the time scale of the change is much slower than the time scale of the photon travel across the chip. That is, each photon can see only one time-independent Hamiltonian from the moment it enters the chip until the moment it reaches the output. But the next photon to arrive can experience a different Hamiltonian. This assumption is plausible since it is impossible to change the voltage faster than the flight time of the photon in the chip. This is what allows us to write the evolution as the matrix exponential of the Hamiltonian as in Equation 1, without the time-ordering operator.
In the basic experimental setup we can only measure output power distribution. For example, for an chip, if the input state is , and the output state after evolution is , then the output distribution we measure is . However, to characterize a fully quantum model, we need to measure phases at the output. One of the convenient ways experimentally to measure relative phase shifts between two optical paths is through Mach-Zehnder interferometery as shown in Figure 2. Recall the basic idea is to construct a quantum circuit whose output probability amplitude depends on the phase shift required to be measured. With an initial state , a standard calculation shows that the final state after the beamsplitter at the bottom-right of the diagram is
[TABLE]
Now, if we measure the power at the detector, we get . Now if we do two of such measurements corresponding to values of and , we can exactly calculate both the amplitude and phase of this output. Particularly,
[TABLE]
where denotes the phase of . These two equations can be solved simultaneously to find the amplitude and phase of . Now, the procedure can be repeated by placing the mirror at the top-right of the diagram at all other outputs of the chip and obtain the amplitude and phase of this part of the state. Since we have an -dimensional pure state, it is completely defined by degrees of freedom corresponding to real and imaginary part of each coefficient. (In fact, only are needed since we have the normalization constraint, and a non-significant global phase shift). The same procedure can be executed to characterize the output state when other inputs are activated. Finally, it is worth mentioning that this setup for measuring phase is not the only possible way, there might be a more efficient way to measure the phases at the output without requiring to move the optical components spatially. This is however out of the scope of this paper.
II.2 Experimental challenges
There are many experimental challenges faced when characterizing a fabricated chip, as well as designing the control voltages to implement some desired behavior. Any model for the device should account for these constraints. These challenges are listed as follows.
The drifting in the measured output optical power.
This problem is caused by charges getting trapped at the interface between the Silicon Dioxide and the Lithium NiobateYM (81); GTBY (85); NI (95). These charges have very low mobility and therefore take a long time to accumulate and a long time to diffuse when the voltage is removed. These trapped charges are the central reason we have difficultly controlling and characterizing this device. The long diffusion time results in the voltage never ‘resetting’ to zero. It then becomes extremely difficult to infer what electric field is being applied to the waveguide. In any case, the chip has some equivalent electrical circuit model. But this is difficult to model and characterize experimentally, as we cannot measure physically the voltages the chip actually sense when we apply externally some control voltages. The only available measurements are the output waveguide power distribution, which depends non-linearly on the control voltages. This makes the problem a non-linear control and estimation problem and that is classically difficult to solve. These effects cannot be neglected as well because the distortions in the control voltages will be reflected on the measured power distribution. It will also have a memory effect in the sense that when we apply some control pulse, the output power will be affected by that pulse in addition to the previous pulses that were applied. This means that if at some point in time we set all the control voltages to ground, we will still observe variation of the power distribution in time. The classic way of overcoming this problem is during fabrication by etching the buffer layer between the electrodes YM (81). However, for the particular chip we are working with, the dimensions are very small and technologically it is difficult to do this process. Thus, this problem has to be addressed differently. Therefore, the model should account for these unknown distortions, and it should be trainable using only available power measurements. 2. 2.
The uncertainties regarding the structure of the Hamiltonian.
Usually, the Hamiltonian in these chips is assumed to have a tridiagonal form reflecting the fact that only adjacent waveguides are coupled BLMS (09); PLM*+* (10). But there is a possibility that there are non-negligible higher order couplings between the waveguides leading to more off-diagonal terms. Also, one could assume the linear dependence of the Hamiltonian on the control voltages. But, this assumption might not be true as there might be higher order terms. Thus, the model should not assume any particular form of the Hamiltonian except that it is Hermitian as required by quantum mechanics. 3. 3.
The power losses at the output.
Losses in the measured power occur due to the coupling of the chip to the external optical fibers connected to the photodetector. These will cause inaccuracies in the measurements affecting any parameter estimations. These losses also have to be characterized so that we can make corrections for the detected power signals. We will model the losses by
[TABLE]
where is the normalized measured power at waveguide , and is the actual power at the output of the chip for waveguide . The normalization is for making the measurements constitute a probability distribution. The model should account for these losses. 4. 4.
The limitation on the control voltages.
Generally, in order to obtain some target output for the device, we need control voltages that can be arbitrarily large. However, if the potential difference across any pair of electrodes exceeds some maximum value, the device will break down. It might be the case that within this limitation one cannot obtain the target with infinite precision. This controllability issue is a different problem and is a subject of the future work of this paper. And so, any control algorithm should try to maximize the accuracy of the target output without exceeding the allowed range for the control voltages.
As a result of all the previous challenges, estimating and controlling the Hamiltonian directly from measured data is very difficult using the whitebox approach. The complete blackbox might perform well but as discussed it will not give physical insights on the device. Thus, we will seek the graybox approach for modeling the chip under the aforementioned constraints. The blackbox part will represent the map between the Hamiltonian and the control voltages. This allows getting rid of any assumptions on the Hamiltonian as well as accounting for the pulse distortions. The whitebox part will represent the other certain relations derived from quantum mechanics. The next section will give more details about how to construct such a model using deep learning.
III Methods
In the previous section we described the challenges we face in experimentally characterizing the chip if we use conventional methods of model and parameter estimation. In order to address all these challenges, we propose to use a completely data-driven approach rather than a parametric approach. We are going to use graybox model where the Hamiltonian will be treated as a blackbox, while the quantum evolution and quantum measurement will be treated as whitebox. This is because all the uncertainties are in the Hamiltonian, while the all the laws of quantum mechanics are known. We will design a deep learning structure to implement this idea. The problem will be divided into two stages. The first stage, a set of known control voltages and corresponding power distribution will be used by a supervised deep learning algorithm to find a complete graybox model for the chip. The second stage will be creating another deep learning structure to find the control voltages that results in some desired behavior of the chip, using the estimated model from the first stage.
This section starts with a detailed description of the architecture used to model the chip. Next, the training and testing procedures are presented. After that, the detailed description of the control voltages predictor for the chip is presented. Finally, the section ends with extending the proposed structure to account for a fully-quantum setting where phases can be measured at the output.
III.1 Chip model architecture
The deep learning architecture is shown in Figure 3. The first layer in the model is a Gated Recurrent Unit (GRU) CVMG*+* (14). This is a variant of the Long-Short Term Memory (LSTM) structure often used in sequence prediction and classification HS (97). GRU is more efficient than LSTM as it has fewer parameters to be learned during the training stage. However, in terms of accuracy, it is not very clear which is better generally, and this remains an open topic under investigation within the machine learning community CGCB (14). The number of inputs is equal to the number of electrodes which is . For our implementation, the number of hidden units of the GRU is chosen to be 60. In general, more hidden units allow modeling more complex waveforms, but on the expense of more parameters to learn and thus more computational resources required. The objective of this layer is to learn the interaction Hamiltonian, i.e. learn how the Hamiltonian depends on the external voltages. This should also include the parasitic effects in the chip causing distortions of the applied voltage waveforms. The number of free parameters of any real-valued symmetric Hamiltonian of size is . However, the output of the GRU is the output of the each hidden node. So, to extract the required number of outputs, we add a neural network (NN) formed of a single layer that is fully-connected to all of the outputs of the GRU. The number of neurons is exactly equal to , as each neuron generates one output. Linear activation is used for all neurons, to allow the output to take any value and not be restricted in some range if we use other activations such as sigmoid. Notice, that the GRU is a sequential layer, so the output has an extra dimension of time. However, the NN layer is static acting equivalently on each time slice of the output of the GRU. This means that weights applied to the GRU output at every time instant are the same. These two layers together act as a device to learn the free parameters of the Hamiltonian as a function of the input voltages.
The third layer in the structure is a custom-defined layer that has two functionalities. The first one is to reconstruct a symmetric matrix from the output of the previous layer. This is done by reshaping the outputs as an upper triangular matrix, and then sum it with its transpose. The second functionality is to add to the drifting Hamiltonian, that is the zero-voltage Hamiltonian that models the inherent coupling between the waveguides. The parameters of this drifting Hamiltonian are learned during the training process as will be illustrated later. The final output of this layer is therefore the full Hamiltonian of the system.
The next layer of the model is the quantum evolution layer. This is a custom defined layer, that takes some Hamiltonian as input, an initial quantum state as a defining parameter, and generates the probability amplitudes of the an evolved state as output. These probability amplitudes correspond to the waveguide power distribution. So, the layer first calculates the evolution matrix . Next, it calculates the evolved state . Finally, it calculates the probability amplitudes of the evolved state .
Now, a problem arises if we train the model with the structure so far. Since, only one initial state is used in the quantum layer, then the learned Hamiltonian will be valid only for evolutions of this state. But, if we use the same Hamiltonian to evolve other initial states, we might not obtain a correct evolution. So, the algorithm will need to learn a different Hamiltonian for each initial state. This is a major problem, since quantum mechanics is a linear theory, so the Hamiltonian should not depend on the quantum state being evolved. Thus, we have to constrain the Hamiltonian in some sense so that it works for all states. The way we propose to solve this problem is to have different copies of the quantum layer each parameterized by a different initial state. Then, we connect the input of all these layers to the same output of the previous Hamiltonian layer. In this case, during the training, the model will be enforced to generate a Hamiltonian that correctly evolves each of the initial states. Since a unitary can be completely characterized by knowing the outputs corresponding to each of the computational bases as input states, we only need of ‘parallel’ quantum structures each generating outputs. So, the total number of outputs for this whole layer is .
The final layer in the model is also a custom-defined layer that models losses during power measurements. This physically occurs due to coupling between between the chip and optical fibres connected to the photodetectors. The layer simply implements the calculation , where is the measured power at waveguide , and is the actual power at the output of the chip for waveguide . The denominator in the expression is to ensure that the measured powers are normalized, (i.e. form a distribution). The coupling coefficients are learned during the training stage as will be discussed later. For each quantum block in the quantum evolution layer, we cascade one of these coupling layers. However, all of these copies of the coupling layers are identical (i.e. have the same parameters). This reflects the fact that the losses are independent of which waveguide was used as input, and just related to the hardware of the experiment.
III.2 Training and Testing
There are two stages to do the training of the model, where all the unknown parameters of the model are leaned by providing examples. The first stage is to learn all the zero-voltage parameters, i.e. the drifting Hamiltonian and the coupling losses coefficients. All these parameters are static and do not depend on the input voltages. For this training step, we detach the GRU and NN layers from the model. The input of the model is directly connected to the Hamiltonian construction layer, and is fixed to be all zeros. The output is the lossy power distribution. This is obtained experimentally by fixing the physical voltage on the chip to zero, using one of the waveguides as input and measure the power across each waveguide. The procedure is repeated for all input waveguides. Since, the distribution in this case is static, we get a total of readings. With this pair of training data (zero voltage as input, and readings as output), the model is trained by backpropagation using RMSprop TH (12), and all the unknown parameters are learned. We use the mean square error (MSE) as the loss function and also as the performance metric. This is because the problem is predicting a waveform, and MSE is one of the most commonly-used metrics for quantifying similarity between two waveforms. The lack of phase information at the output prevents us from constructing a full quantum state and thus evaluating quantum measures such as fidelity is not possible.
The second stage of training is to obtain the dynamic behavior of the chip, (i.e. how the waveguide power distribution changes in time being a function of the input time-varying voltage. In this stage, the full model is used, and the input is connected to the GRU layer. All parameters learned from the first stage are fixed and do not change during this stage. Backprogation is used to train the remaining unknown parameters using the pair of some voltage waveforms as input, and the corresponding measured power distribution waveforms as output, with MSE acting as loss function. After this stage, all the learned parameters are fixed and the model can be used in the testing phase.
In the testing phase, the model is given a new input that was not in the training set, and the predicted output is compared with the actual output. A good model is a one that generalizes well over new inputs. The end goal of using this architecture is a graybox model of the chip, capable of predicting the output distribution for any control voltage. However, practically this is a hard requirement due to the behavior of machine learning algorithms. Usually, these structures have the ability to generalize for inputs that share some similarity with the training examples. In our case, the voltage waveform shape should be the same for the training and testing datasets (i.e. fix the pulse shapes to be either square, Gaussian, raised cosine…etc.). After fixing the shape, the waveform parameters (such as amplitude, phase shift,…etc.) for each example can be arbitrary. If we want the model to predict the output for other waveform shapes, then the training set has to include the other shapes as well. In this paper, we restrict all the voltage waveforms to have the form of arbitrary synchronized square pulses. This means that for each example, the pulses across all electrodes start at the same time instant, have the same width, but can have different amplitudes. These parameters will differ though across different examples in the datasets.
The architecture of this model has a major advantage which is the possibility of monitoring the output of each layer during testing, each corresponds to a physically significant quantity. So, the output of the NN layer is a prediction of the interaction Hamiltonian as a function of the input voltages and time. The output of the “Hamiltonain Construction” layer is a prediction of the total Hamiltonian matrix. The “Quantum Evolution” layers predict the ideal power distribution for each initial state, while the output of the last layer is prediction of the measured power distribution. This shows that relevance of this deep learning structure. For instance, had we used one LSTM-based blackbox instead of the proposed graybox, we would have been able to predict the measured power distribution only, but not the other quantities.
III.3 Controller Architecture
The second major task required after characterizing the chip is finding the control voltages needed to obtain a desired power distribution, resulting from the evolution of a target Hamiltonian. The architecture of prosed controller is shown in Figure 4. The first layer is a GRU layer followed by a fully-connected neuron layer similar to that used in the model architecture. However, the input is some desired target Hamiltonian, and output shall represent the control voltages which is a vector. Since we need at least one of the electrodes to be connected to ground, we actual enforce the very first electrode to zero. Also, we enforce the last electrode arbitrarily to zero. This leaves out control voltages to predict. For efficiency purposes, we actually input only the upper triangular part of the Hamiltonian flattened into an vector.
One major issue to consider is that the voltage across any two adjacent electrodes should not exceed in absolute value . So, all the neurons at the output have a scaled hyperbolic tangent sigmoid activation in the form . This ensures the output at each electrode is in , and thus the potential difference across any two adjacent electrodes is limited to .
Next, we cascade a copy of the previously trained model without the coupling loss layers. The reason behind dropping that layer is that the power loss is due to the measurement process, and not the operation of the chip. For instance, if two chips were connected in cascade with perfect coupling, then we would be interested to predict the control voltages for the first chip to produce some desired state at its output, and there will be no effects of the losses for the first chip. All the trained parameters of the model are fixed and do not change during the training of the controller. Connecting the pre-trained model enforces the whole controller structure to generate the ideal target power distribution. Thus, all the distortions that appear in the power distribution are dealt with automatically by the controller. The algorithm is enforced to produce voltage waveforms that undo the distortion effects in order to minimize the MSE. This means the algorithm is effectively learning an inverse model of the equivalent circuit of the chip, and simultaneously ensuring the final quantum state is correct. In some sense, this structure does both classical control (undoing the distortions) and quantum control (obtaining the target quantum state). By probing the output of the NN layer, the desired control voltage can be estimated.
It is worth mentioning that there is no requirement on the controller to generalize to every possible target Hamiltonian/target-distribution pair. Whenever we are interested to realize some sequence of operation on the chip, we redo the training of the controller, and probe the output of the NN layer. So, in some sense we are using backpropagation as a direct optimization procedure rather than a learning procedure. Additionally, the controller input is a sequence representing the Hamiltonian at each time step. This means we can obtain control voltages that allows changing the behavior of the chip dynamically whilst operating.
The last point to note is that not every possible Hamiltonian can be realized with the chip model. Some Hamiltonians may require voltages that exceed the maximum allowed range. An open question is what kind of quantum gates can be actually implemented using this chip given the constraints. This is however outside the scope of this paper.
III.4 Fully-quantum model
The architectures described so far are not fully quantum in the sense that the Hamiltonian is assumed to be real, and that we can only measure powers at the output (corresponding to probability amplitudes). However, it is possible to extend the proposed method to the fully quantum case, if we perform the Mach-Zehnder type of measurements as discussed previously. The overall architecture is quite similar, with the following modifications:
- •
The neural layer after the GRU is set to produce outputs instead of the , to account for the imaginary part of the Hamiltonian matrix elements.
- •
The Hamiltonian layer reshapes the output of the neural layer to an matrix, where the lower triangular part represents the imaginary part of the Hamiltonian while the upper triangular part represents the real part. So, by multiplying the lower triangular part by and adding the whole matrix to its Hermitian conjugate, we end up with an Hermitian matrix. Also, the zero-voltage Hamiltonian is manipulated similarly to account for the possibility of complex-valued entries.
- •
The quantum layer outputs the Mach-Zehnder interferometer power measurements instead of the probability amplitudes. So if the final state is , then the layer’s outputs are , and , for all . So, the total number of outputs for this layer is , and for the whole model is . We do not need to explicitly calculate the amplitude and phases from the interferometer measurements for the training. We will just use the interferometer measurements directly. The training follows the same procedure as discussed previously.
- •
For simplicity, we removed the coupling layer as the focus in this application is on exploring the possibility of learning a fully quantum system. However, in general we can include it.
- •
We still use MSE as a loss function and performance metric because the output is still a waveform (although representing interference measurements now). However, since there is complete information to reconstruct the state and the evolution matrix, we can use other metrics for performance evaluation such as fidelity.
- •
The controller architecture is the same, the only difference is the input of the first layer is the real and imaginary parts of the target unitaries, rather than the Hamiltonians. This seemed to perform better than having the Hamiltonians as input. This might be due to the fact that there exist infinitely many Hamiltonians (all related with a factor of integer multiple of in the eigenvalues) giving rise to the same unitary. And thus, the GRU might have trouble finding some of these equivalent Hamiltonians. However, if the input is directly the unitary then there is no redundancy. For the classical application, this did not seem to cause any problems because there was more freedom as the optimization is over the power distribution only. In the quantum application, it is more restricitve since the optimization is over the phase information as well.
IV Simulation results
This section discusses the implementation details of our method and the results of the numerical simulations. A discussion on the significance of the results is given afterwards.
IV.1 Implementation
For implementing the proposed architecture we used the “Tensorflow” Python package AAB*+* (15), and its high-level API package “Keras” C*+* (15). The Python implementation of our algorithm is publicly available111https://github.com/akramyoussry/GRUBI.
In order to do training and testing, we created a dataset consisting of control voltages in the form of random pulses, and the corresponding waveguide output power distribution for different input waveguides. We generated a total of 4000 examples, 3500 of which were used for training and 500 for testing. The amplitudes of the pulses are from -5 to +5 volts and the time domain is limited to the interval with sampling time of . In each example, the voltage on the first and last electrodes are fixed at zero, while the pulses are applied on the remaining electrodes. The restriction on these pulses is that they have to be synchronized across the different electrodes, starting and ending at the same time. However, the durations and amplitudes are chosen randomly from one example to another. The experimental setting would be generating these pulses, applying them physically to the chip, measuring the output power distribution, and finally training the model. However, in this paper, we restrict the study to computer simulations. So, we created a simulator for the chip that generates the waveguide power distribution given a set of control voltages, using the Hamiltonian model described by the tridiagonal real-valued matrix
[TABLE]
where is the propagation constant along the waveguide, and is the coupling coefficient between waveguides and . The propagation constant is given by
[TABLE]
where is the wavelength, is the intrinsic refractive index of the waveguide, is a dynamical proportionality constant that determines how much the the propagation constant changes by changing the voltage across the waveguide . The coupling coefficient is given by
[TABLE]
where is the intrinsic coupling between two adjacent waveguides, is the potential difference across the substrate between the two waveguides and , and are the voltages across waveguides and , and and are dynamical proportionality constants that determine the amount of change of the coupling between two waveguides by changing the voltages across them. These relations assume that Hamiltonian depends on the voltages linearly, and that the coupling is always between neighboring waveguides. The simulator takes into account the non-ideal effects due to the equivalent circuit behavior of the chip, by simulating distortions on voltage pulses. It also simulates coupling losses. For the results presented in this paper, the simulation parameters were as follows. , , , , , , , , and .
IV.2 Results
For the task of modeling the chip, the MSE obtained after iterations was about for the training dataset. Figure 5a shows the MSE versus the number of iterations. For the testing dataset, the MSE evaluated is . Supplementary Figures 8,9, and 10 show examples selected randomly of the testing dataset including the control voltages, simulated measured waveguide power distribution and the predicted power distribution.
To test the control part, we defined as an example a sequence of target unitaries in the time interval , given by
[TABLE]
where the unitaries are defined in Table 1. The Hamiltonian is then evaluated for each time interval by taking the matrix logarithm .
After training the controller model for iterations, the MSE was . The MSE versus the number of iterations is plotted in Figure 6a. The resulting control voltages are shown in Supplementary Figure 11, and the resulting predicted ideal power distribution in Supplementary Figure 12.
For the second application which is the fully-quantum setting, we use the same dataset of pulses, but now we have the interferometer power measurements as the model output. The number of iterations is , which is more than the other model to account for doubling the size of the outputs. Figure 5b shows the performance of the training in this case. The MSE evaluated for the testing dataset it , while it was for the training set. This is an indication for the the ability of the model to fit the training dataset as well as generalize to the testing dataset. Supplementary Figures 13 and 14 show the result of the predicted waveforms using the same control pulses as in Supplementary Figures 8 and 9. Now, since the phase is also measured, then we can have a complete quantum description of the output state, and thus we can construct the evolution unitary. A commonly used measure for the closeness of two quantum gates and of dimension , is the gate infidelity defined as
[TABLE]
Infidelity is thus a number between 0 and 1, with 0 representing complete overlap (i.e. same matrices). Supplementary Figure 15 shows the infidelity between the predicted unitary and actual unitary as a function of time for these two examples. Finally, for evaluating the control algorithm in this setting, we used as an example the following sequence for
[TABLE]
The history of the MSE of the controller during the training is shown in Figure 6b. The resulting infidelity between the desired sequence of quantum gates and the controlled quantum gates are shown in Figure 7, while the control voltages are shown in Supplementary Figure 16.
IV.3 Discussion
The presented results show the accuracy of the proposed architecture in modeling the chip with all the constraints mentioned earlier. Quantitatively, the loss represented by the MSE decreases on average by increasing the number of iterations during the training phase, reaching a small value that is in order of . However, this is not sufficient to completely asses the behavior of the proposed algorithm. The plots of the waveforms in Supplementary Figures 8,9, and 10 show qualitatively the accuracy of the model. The difference between the predicted and simulated power distribution is almost negligible. More importantly, since the model has not been trained on the testing set, it proves that the proposed structure can generalize. This important for the task of modeling. The architecture does not allow to give explicit mathematical expression for the Hamiltonian. But, due to its ability to generalize, we can just use it directly to estimate the Hamiltonian given the control voltages. Also, quantitatively the MSE evaluated for the testing set is also in the order of , without much degradation than the value for the training set.
The qualitative results also show that the architecture is able to handle all the challenges described in Section II.2. We were able to model the distortions caused by the equivalent circuit without the need to explicitly define a particular circuit model or how the Hamiltonian depends on the circuit response. This also saves us from having to characterize these parasitic effects experimentally, which is difficult as discussed previously.
For the control task, the proposed method was also very successful in obtaining the required control voltages as reflected in Supplementary Figure 12. We see that the distortions that were present in the power distributions are not there anymore, and at the same time we were able to achieve the required functionality. The control voltages were also limited to the desired operating range. However, we see that for the gate, the algorithm could not do full transfer between waveguides 1 and 2. We believe that this is related to the fact that not all gates are possible to implement, which is a subject of the future work. A final thing to notice is that all the examples in the training set were limited to the time range . However, the target control sequence has a wider range , and still we are successful in our task. This is a result of using the GRU layers, and shows how the whole model generalizes quite well.
The proposed modifications in the architecture to account for fully-quantum models was also very successful. This is evident from the low MSE value for both training and testing datasets with small difference between both. This is supported qualitatively through the plots of the power waveforms and infidelity versus time. Also, the controller architecture seems to perform quite well. The example we tested shows the possibility of implementing some basic quantum gates which are identity, Pauli X, rotation about X-axis with angle which is equivalent to a Hadamard gate with phase shifts, and rotation about Z-axis. At each time instant, the photon traveling through the chip will sense a different quantum gate. The gate infidelities at all time instants, apart from the transition moments, are low (worst case was ). The gates act on a qubit spatially encoded between the first and last waveguide. However, there is a major advantage for our proposed controller architecture, which is the input is the target sequence of quantum gates rather than a single gate. In general, the control voltages required for realizing a particular gate can depend on the previous history of gates realized so far due to the drifting problem described earlier. In other words, the same gate could need different control pulses at different points in time during the operation of the chip. Our proposed method deals automatically with this issue compared to standard quantum control literature that deals with one target quantum gate only KRK*+* (05); CCM (11); MATW (15).
V Conclusion
In this paper, we proposed a deep learning structure that is suitable to model a reconfigurable integrated waveguide array chip. The architecture addresses three major problems faced when characterizing the chip experimentally. The uncertainty in the Hamiltonian model, the presence of undesired macroscopic dynamics causing distortions, and losses due to imperfect measurements. The proposed architecture followed a graybox model approach, where the Hamiltonian as a function of control voltages is treated as a blackbox utilizing a GRU network as a main component. The waveguide power distribution as function of the Hamiltonian is treated as a whitebox since the laws of quantum mechanics are known. We also proposed another complementary deep learning structure to obtain the control voltages required to achieve some target sequence of gates. The qualitative as well as quantitative results showed a very promising performance for both tasks.
There are many possible extensions to the presented work. On the theoretical side, it would be interesting to know the set of gates that are possible to implement on this chip given the constraints introduced in the model, such as limited control voltages. We would like also to validate the numerical results shown in the paper experimentally on the physical chip which is currently in progress. Another interesting extension is to explore the use of fidelity as cost function to do the training rather than the MSE, and see whether or not would it yield better results. Finally, it would be worth looking into extending the methods introduced in this paper to model and control other quantum systems.
Acknowledgments:
AY is supported by an Australian Government Research Training Program Scholarship, and acknowledges RMIT University for hosting him during his visit. A.P. acknowledges funding from the Australian Research Council Centre for Quantum Computation and Communication Technology CE170100012; Australian Research Council Discovery Early Career Researcher Award, Project No. DE140101700; RMIT University Vice-Chancellors Senior Research Fellowship and a Google Faculty Research Award. MT and CF acknowledge Australian Research Council Discovery Early Career Researcher Awards, projects No. DE160100821 and DE170100421, respectively. This research is also supported in part by the ARCLab facility at UTS.
Appendix A Supplementary figures
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1AAB + (15) Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vij
- 2AAB + (19) Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando GSL Brandao, David A Buell, et al. Quantum supremacy using a programmable superconducting processor. Nature , 574(7779):505–510, 2019. doi:10.1038/s 41586-019-1666-5 . · doi ↗
- 3ACH + (18) Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak. Online learning of quantum states. In Advances in Neural Information Processing Systems , pages 8962–8972, 2018.
- 4ADZ (93) Y. Aharonov, L. Davidovich, and N. Zagury. Quantum random walks. Phys. Rev. A , 48:1687–1690, Aug 1993. doi:10.1103/Phys Rev A.48.1687 . · doi ↗
- 5BDS + (18) Marin Bukov, Alexandre G. R. Day, Dries Sels, Phillip Weinberg, Anatoli Polkovnikov, and Pankaj Mehta. Reinforcement learning in different phases of quantum control. Phys. Rev. X , 8:031086, Sep 2018. doi:10.1103/Phys Rev X.8.031086 . · doi ↗
- 6BLMS (09) Yaron Bromberg, Yoav Lahini, Roberto Morandotti, and Yaron Silberberg. Quantum and classical correlations in waveguide lattices. Physical review letters , 102(25):253904, 2009. doi:10.1103/physrevlett.102.253904 . · doi ↗
- 7BOTB (18) Paul Baireuther, Thomas E O’Brien, Brian Tarasinski, and Carlo WJ Beenakker. Machine-learning-assisted correction of correlated qubit errors in a topological code. Quantum , 2:48, 2018. doi:10.22331/q-2018-01-29-48 . · doi ↗
- 8C + (15) François Chollet et al. Keras. https://keras.io , 2015.
