Implementing a distance-based classifier with a quantum interference   circuit

Maria Schuld; Mark Fingerhuth; Francesco Petruccione

arXiv:1703.10793·quant-ph·January 17, 2018

Implementing a distance-based classifier with a quantum interference circuit

Maria Schuld, Mark Fingerhuth, Francesco Petruccione

PDF

5 Repos

TL;DR

This paper presents a simple quantum interference circuit for distance-based classification, demonstrating effective pattern recognition with minimal quantum resources on IBM Quantum hardware.

Contribution

It introduces a novel, resource-efficient quantum classifier using interference circuits, contrasting with complex classical models on large quantum computers.

Findings

01

Classifies benchmark tasks effectively with simple quantum circuits

02

Demonstrates feasibility on IBM Quantum hardware

03

Shows promising performance with numerical simulations

Abstract

Lately, much attention has been given to quantum algorithms that solve pattern recognition tasks in machine learning. Many of these quantum machine learning algorithms try to implement classical models on large-scale universal quantum computers that have access to non-trivial subroutines such as Hamiltonian simulation, amplitude amplification and phase estimation. We approach the problem from the opposite direction and analyse a distance-based classifier that is realised by a simple quantum interference circuit. After state preparation, the circuit only consists of a Hadamard gate as well as two single-qubit measurements, and computes the distance between data points in quantum parallel. We demonstrate the proof-of-principle using the IBM Quantum Experience and analyse the performance of the classifier with numerical simulations, showing that it classifies surprisingly well for simple…

Figures4

Click any figure to enlarge with its caption.

Tables3

Table 1. Table 1: Classification results for the two-dimensional input vectors 𝐱 ~ ′ superscript ~ 𝐱 ′ \tilde{\mathbf{x}}^{\prime} and 𝐱 ~ ′′ superscript ~ 𝐱 ′′ \tilde{\mathbf{x}}^{\prime\prime} from the Iris flower dataset. Experimental results are always shown on top of their corresponding simulation result (marked with triangle) and theoretical prediction (marked with asterisks).

Input vector	$p_{acc}$	$p (\| c ⟩ = \| 0 ⟩)$	$p (\| c ⟩ = \| 1 ⟩)$
	0.455	0.516	0.484
${\tilde{𝐱}}^{'}$	0.731^▷	0.629^▷	0.371^▷
	0.729*	0.629*	0.371*
	0.494	0.589	0.411
${\tilde{𝐱}}^{''}$	0.911^▷	0.548^▷	0.452^▷
	0.913*	0.547*	0.453*

Table 2. Table 2: Test error for the quantum classifier on different datasets for 1000 1000 1000 random separations into test and training set. Using feature maps leads to a zero classification error in the examples of classes 2& 3 of the Iris data set as well as the circles data set.

Dataset	test error	variance	$p_{acc}$
Iris class 1&2	$0.00$	$0.000$	$0.50$
Iris class 1&3	$0.00$	$0.000$	$0.50$
Iris class 2&3	$0.07$	$0.003$	$0.50$
Iris class 2&3, feat map	$0.00$	$0.000$	$0.50$
Circles	$0.62$	$0.006$	$0.50$
Circles, feat map	$0.00$	$0.000$	$0.55$

Table 3. Table 3: Raw experimental results for the classification of the input vectors 𝐱 ~ ′ superscript ~ 𝐱 ′ \tilde{\mathbf{x}}^{\prime} and 𝐱 ~ ′′ superscript ~ 𝐱 ′′ \tilde{\mathbf{x}}^{\prime\prime} . The table shows the occurence counts for each four-qubit quantum state, | 0000 ⟩ → | 0 ⟩ , | 0001 ⟩ → | 1 ⟩ … formulae-sequence → ket 0000 ket 0 → ket 0001 ket 1 … |0000\rangle\rightarrow|0\rangle,|0001\rangle\rightarrow|1\rangle... , in 8192 runs.

	$\| 0 ⟩$	$\| 1 ⟩$	$\| 2 ⟩$	$\| 3 ⟩$	$\| 4 ⟩$	$\| 5 ⟩$	$\| 6 ⟩$	$\| 7 ⟩$	$\| 8 ⟩$	$\| 9 ⟩$	$\| 10 ⟩$	$\| 11 ⟩$	$\| 12 ⟩$	$\| 13 ⟩$	$\| 14 ⟩$	$\| 15 ⟩$
$𝐱^{'}$	773	1400	223	172	210	205	1013	578	823	1175	117	113	95	114	476	705
$𝐱^{''}$	948	1117	166	145	155	128	680	626	1139	1136	131	122	127	149	694	729

Equations17

\tilde{y}=\mathrm{sgn}\left(\sum\limits_{m=1}^{M}y^{m}\Big{[}1-\frac{1}{4M}|\tilde{\mathbf{x}}-\mathbf{x}^{m}|^{2}\Big{]}\right).

\tilde{y}=\mathrm{sgn}\left(\sum\limits_{m=1}^{M}y^{m}\Big{[}1-\frac{1}{4M}|\tilde{\mathbf{x}}-\mathbf{x}^{m}|^{2}\Big{]}\right).

|\mathcal{D}\rangle=\frac{1}{\sqrt{2MC}}\sum_{m=1}^{M}|m\rangle\Big{(}|0\rangle|\psi_{\tilde{\mathbf{x}}}\rangle+|1\rangle|\psi_{\mathbf{x}^{m}}\rangle\Big{)}|y^{m}\rangle.

|\mathcal{D}\rangle=\frac{1}{\sqrt{2MC}}\sum_{m=1}^{M}|m\rangle\Big{(}|0\rangle|\psi_{\tilde{\mathbf{x}}}\rangle+|1\rangle|\psi_{\mathbf{x}^{m}}\rangle\Big{)}|y^{m}\rangle.

\frac{1}{2\sqrt{M}}\sum_{m=1}^{M}|m\rangle\Big{(}|0\rangle|\psi_{\tilde{\mathbf{x}}+\mathbf{x}^{m}}\rangle\big{)}+|1\rangle\big{(}|\psi_{\tilde{\mathbf{x}}-\mathbf{x}^{m}}\rangle\big{)}\Big{)}|y^{m}\rangle,

\frac{1}{2\sqrt{M}}\sum_{m=1}^{M}|m\rangle\Big{(}|0\rangle|\psi_{\tilde{\mathbf{x}}+\mathbf{x}^{m}}\rangle\big{)}+|1\rangle\big{(}|\psi_{\tilde{\mathbf{x}}-\mathbf{x}^{m}}\rangle\big{)}\Big{)}|y^{m}\rangle,

\frac{1}{2 M p _{acc}} m = 1 \sum M i = 1 \sum N ∣ m ⟩ (\tilde{x}_{i} + x_{i}^{m}) ∣ i ⟩ ∣ y^{m} ⟩ .

\frac{1}{2 M p _{acc}} m = 1 \sum M i = 1 \sum N ∣ m ⟩ (\tilde{x}_{i} + x_{i}^{m}) ∣ i ⟩ ∣ y^{m} ⟩ .

p (\tilde{y} = 0) = \frac{1}{4 M p _{acc}} m ∣ y^{m} = 0 \sum ∣ \tilde{x} + x^{m} ∣^{2},

p (\tilde{y} = 0) = \frac{1}{4 M p _{acc}} m ∣ y^{m} = 0 \sum ∣ \tilde{x} + x^{m} ∣^{2},

80

80

∣ ψ_{\tilde{x}^{'}} ⟩

∣ ψ_{\tilde{x}^{'}} ⟩

∣ ψ_{\tilde{x}^{''}} ⟩

∣ ψ_{x^{0}} ⟩

∣ ψ_{x^{1}} ⟩

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Implementing a distance-based classifier with a quantum interference circuit

Maria Schuld

[email protected]

Quantum Research Group, School of Chemistry and Physics, University of KwaZulu-Natal, Durban 4000, South Africa

Mark Fingerhuth

[email protected]

Quantum Research Group, School of Chemistry and Physics, University of KwaZulu-Natal, Durban 4000, South Africa

Maastricht Science Programme, University of Maastricht, 6200 MD Maastricht, The Netherlands

Francesco Petruccione

[email protected]

Quantum Research Group, School of Chemistry and Physics, University of KwaZulu-Natal, Durban 4000, South Africa

National Institute for Theoretical Physics, KwaZulu-Natal, Durban 4000, South Africa

Abstract

Lately, much attention has been given to quantum algorithms that solve pattern recognition tasks in machine learning. Many of these quantum machine learning algorithms try to implement classical models on large-scale universal quantum computers that have access to non-trivial subroutines such as Hamiltonian simulation, amplitude amplification and phase estimation. We approach the problem from the opposite direction and analyse a distance-based classifier that is realised by a simple quantum interference circuit. After state preparation, the circuit only consists of a Hadamard gate as well as two single-qubit measurements, and computes the distance between data points in quantum parallel. We demonstrate the proof-of-principle using the IBM Quantum Experience and analyse the performance of the classifier with numerical simulations, showing that it classifies surprisingly well on simple benchmark tasks.

Quantum machine learning, quantum algorithm, supervised classification, state preparation, IBM Quantum Experience

pacs:

03.67.Ac,07.05.Mh

Quantum machine learning, an emerging discipline combining quantum computing with intelligent data mining, witnesses a growing number of proposals for quantum algorithms that solve the problem of supervised pattern recognition rebentrost14 ; kapoor16 ; benedetti16b ; denchev12 ; schuld16lr . In supervised learning, a dataset of labelled inputs or feature vectors is given, and the task is to predict the label of a new feature vector. For example, the inputs could be images of persons, while the label is the name of the person shown in the picture. Image recognition software is then supposed to recognise which person is shown in a previously unseen image. A central question of quantum machine learning is if and how a quantum computer could enhance methods known from machine learning dunjko16 . A large share of the suggested quantum machine learning algorithms are based on the assumption of a large-scale universal quantum computer that can implement nontrivial circuits. This is specifically true for distance-based machine learning models: Quantum algorithms for $k$ -nearest neighbour and clustering have been based on extensions of amplitude amplification wiebe14 ; aimeur07 while quantum kernel methods such as support vector machines rebentrost14 and Gaussian processes zhao15 rely on the rather technical routines for quantum matrix inversion harrow09 or density matrix exponentiation lloyd14 . Experimental implementations are necessarily limited to demonstrations that concede a vast reduction in complexity of the quantum circuits cai15 ; li15 . Most importantly, quantum machine learning remains an enterprise to merely mimic methods from classical machine learning that have been tailor-made for classical computation.

The aim of this Letter is to propose a change in perspective: We start with the most simple quantum circuit and show that it can be used as a – likewise simple – model of a classifier. Instead of choosing a textbook machine learning algorithm and asking how to run it on a quantum computer, we turn the question around and ask what classifier can be realised by a minimum quantum circuit. The basic idea is to use quantum interference to evaluate the distance measure of a kernel classifier in quantum parallel. A similar idea has been investigated by some of the authors in ref.schuld14neigh , but based on a less powerful information encoding strategy and a more complex circuit.

If an efficient state preparation routine is known, the algorithm explored here harvests the same logarithmic scaling in the dimension and number of the input data that has been claimed by other authors zhao15 ; rebentrost14 , but only requires a relatively simple setup that can easily be implemented on small-scale quantum computing devices available today mohseni17 . Evidently, by using only a single-qubit gate this “speedup” is not necessarily based on quantum resources. However, besides the argument we want to make we envision this to be interesting in situations where quantum states generated by quantum simulations - for example in quantum chemistry - have to be classified coherently. This case is sometimes referred to as ‘quantum data’.

In order to demonstrate the circuit, a simplified supervised pattern recognition task based on the famous Iris flower dataset is solved with the 5-qubit quantum computer provided by the IBM Quantum Experience ibmquantumcomputer . Since at the time of writing the interface only allowed an implementation of $80$ quantum gates, numerical simulations show that the classifier performs well enough in simple benchmark tasks.

We consider the task of supervised binary pattern classification which can be formalised as follows: Given a training dataset $\mathcal{D}=\{(\mathbf{x}^{1},y^{1}),...,(\mathbf{x}^{M},y^{M})\}$ of inputs $\mathbf{x}^{m}\in\mathbb{R}^{N}$ with their respective target labels $y^{m}\in\{-1,1\}$ for $m=1,...,M$ , as well as a new unlabeled input $\tilde{\mathbf{x}}\in\mathbb{R}^{N}$ , find the label $\tilde{y}\in\{-1,1\}$ that corresponds to the new input. The classifier effectively implemented by the quantum interference circuit together with a thresholding function is given by:

[TABLE]

The distance measure $\kappa(\mathbf{x},\mathbf{x}^{\prime})=1-\frac{1}{4M}|\mathbf{x}-\mathbf{x}^{\prime}|^{2}$ can be interpreted as a kernel (and is in fact very similar to an Epanechnikov kernel epanechnikov69 ). The model in eq. (1) therefore has the standard form of a kernelised binary classifier, $\tilde{y}=\mathrm{sgn}\left(\sum_{m}w_{m}y^{m}\kappa(\tilde{\mathbf{x}},\mathbf{x}^{m})\right)$ with uniform weights $w_{m}=1$ aizerman64 . Such a model can be derived from a perceptron in which the original weights are expressed by an expansion of the training data as motivated by the representer theorem scholkopf01 , and inner products between inputs are replaced by another kernel function via the “kernel trick”. The model relates to $k$ -nearest-neighbour when setting $k\rightarrow M$ and weighing the neighbours by the distance measure.

The quantum machine learning algorithm that implements the classifier from eq. (1) is based on the idea to encode the input features into the amplitudes of a quantum system and manipulate them through quantum gates - a strategy responsible for most claims of exponential speedups in quantum-enhanced machine learning. We will refer to this approach as ‘amplitude encoding’ to distinguish it from the more common practice of encoding one bit of information into a qubit. Given a classical vector $\mathbf{x}\in\mathbb{R}^{N}$ , where without loss of generality $N$ is assumed to be the $n$ th power of two, $N=2^{n}$ (which can be achieved by padding the vector with zero entries). Furthermore, assume that $\mathbf{x}$ is normalised to unit length, $\mathbf{x}^{T}\mathbf{x}=1$ . Amplitude encoding associates $\mathbf{x}=(x_{1},...,x_{N})^{T}$ with the $2^{n}$ amplitudes describing the state of a $n$ -qubit quantum system, $|\psi_{\mathbf{x}}\rangle=\sum_{i=0}^{N-1}x_{i}|i\rangle$ . Here, $|i\rangle$ is an index register that flags the $i$ th entry of the classical vector with the $i$ th computational basis state.

If one can find an efficient quantum algorithm (i.e., with resources growing polynomially in the number of qubits $n$ ), one manipulates the $2^{n}$ amplitudes ‘super-efficiently’ (i.e., with resources growing logarithmically in the dimension of the Hilbert space, $\mathcal{O}(\log N)$ ). A ‘super-efficient’ algorithm can only maintain its speed if data encoding into a quantum state is also at most polynomial in the number of qubits. There are cases for which this is known to be possible grover02 ; soklakov06 . A proposal frequently referred to is a Quantum Random Access Memory giovannetti08 ; rebentrost14 ; zhao15 that loads the bit strings representing $x_{i}$ in parallel into a qubit register and performs a conditional rotation and measurement of an ancilla to write the values into the amplitude.

The chance of success of this postselective measurement is only high if the $x_{i}$ are uniformly close to one.

Using a suitable state preparation scheme, the quantum classification circuit takes a quantum system of $n$ qubits in state

[TABLE]

Here, $|m\rangle$ is an index register running from $m=1,...,M$ and flagging the $m$ th training input. The second register is a single ancilla qubit whose ground state is entangled with the third register encoding the $m$ th training state, $|\psi_{\mathbf{x}^{m}}\rangle=\sum_{i=0}^{N-1}x^{m}_{i}|i\rangle$ , while the excited state is entangled with the third register encoding the new input $|\psi_{\tilde{\mathbf{x}}}\rangle=\sum_{i=0}^{N-1}\tilde{x}_{i}|i\rangle$ . The fourth register is a single qubit, which is zero if $y^{m}=-1$ and one if $y^{m}=1$ . Effectively, this creates an amplitude vector which contains the training inputs as well as $M$ copies of the new input. The normalisation constant $C$ depends on the preprocessing of the data. We will assume in the following that the feature vectors are normalised and hence $C=1$ .

After state preparation, the quantum circuit only consists of three operations. First, a Hadamard gate on the ancilla interferes the copies of the new input and the training inputs,

[TABLE]

where $|\psi_{\tilde{\mathbf{x}}\pm\mathbf{x}^{m}}\rangle=|\psi_{\tilde{\mathbf{x}}}\rangle\pm|\psi_{\mathbf{x}^{m}}\rangle$ . The second operation is a conditional measurement selecting the branch with the ancilla in state $|0\rangle$ . This postselection succeeds with probability $\mathrm{p}_{\mathrm{acc}}=\frac{1}{4M}\sum_{m}|\tilde{\mathbf{x}}+\mathbf{x}^{m}|^{2}$ . It is more likely to succeed if the collective Euclidean distance of the training set to the new input is small. We will show below that if the data is standarised, postselection usually succeeds with a probability of around $0.5$ . If the conditional measurement is successful, the result is given by

[TABLE]

The amplitudes weigh the class qubit $|y^{m}\rangle$ by the distance of the $m$ th data point to the new input. In this state, the probability of measuring the class qubit $|y^{m}\rangle$ in state [math],

[TABLE]

reflects the probability of predicting class $-1$ for the new input. The choice of normalised feature vectors ensures that $\frac{1}{4M\mathrm{p}_{\mathrm{acc}}}\sum_{m}|\tilde{\mathbf{x}}+\mathbf{x}^{m}|^{2}=1-\frac{1}{4M\mathrm{p}_{\mathrm{acc}}}\sum_{m}|\tilde{\mathbf{x}}-\mathbf{x}^{m}|^{2}$ , and choosing the class with the higher probability therefore implements the classifier from eq. (1). The Supplementary Material shows that the number of measurements needed to estimate $\mathrm{p}(\tilde{y}=0)$ to error $\epsilon$ with a reasonably high confidence interval grows with $\mathcal{O}(\epsilon^{-1})$ .

As a demonstration we implement the interference circuit with the IBM Quantum Experience (IBMQE) ibmquantumcomputer using the Iris dataset fisher36 . Data preprocessing consists of two steps (see fig. 2): We first standardise the dataset to have zero mean and unit variance. This is common practice in machine learning to compensate scaling effects, and in our case ensures that the data does not only populate a small subspace of the input space, which in higher dimensions leads to indistinguishably small distances between data points. Second, we need to normalise each feature vector to unit length. This strategy is popular in machine learning - for example with support vector machines - to only consider the angle between data points. (As an intuition, if we want to classify flowers, some items may have grown bigger than others due to better local conditions, but it is the proportion of the sepal and petal length that is important for the class distinction). This preprocessing strategy allows us to fulfill the conditions of ‘super-efficient’ preprocessing in refs. soklakov06 ; giovannetti08

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd. Quantum support vector machine for big data classification. Physical Review Letters , 113:130503, Sep 2014.
2(2) Ashish Kapoor, Nathan Wiebe, and Krysta Svore. Quantum Perceptron Models. In Advances In Neural Information Processing Systems , pages 3999–4007, Barcelona, Spain, 2016. NIPS.
3(3) Marcello Benedetti, John Realpe-Gómez, Rupak Biswas, and Alejandro Perdomo-Ortiz. Quantum-assisted learning of graphical models with arbitrary pairwise connectivity. ar Xiv preprint ar Xiv:1609.02542 , 2016.
4(4) Vasil Denchev, Nan Ding, Hartmut Neven, and Svn Vishwanathan. Robust classification with adiabatic quantum optimization. In Proceedings of the 29th International Conference on Machine Learning , pages 863–870, ICML, Edinburgh, Scotland, UK, 2012.
5(5) Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. Prediction by linear regression on a quantum computer. Physical Review A , 94(2):022342, 2016.
6(6) Vedran Dunjko, Jacob M Taylor, and Hans J Briegel. Quantum-enhanced machine learning. Physical Review Letters , 117(13):130501, 2016.
7(7) Nathan Wiebe, Ashish Kapoor, and Krysta Svore. Quantum nearest-neighbor algorithms for machine learning. Quantum Information & Computation , 15:0318–0358, 2015.
8(8) Esma Aïmeur, Gilles Brassard, and Sébastien Gambs. Quantum clustering algorithms. In Proceedings of the 24th International Conference on Machine Learning , pages 1–8, Corvalis, Oregon, USA, 2007. ACM.