Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models
Hideyuki Miyahara, Koji Tsumura, and Yuki Sughiyama

TL;DR
This paper introduces DQAEM, a novel algorithm combining quantum annealing with EM to improve Gaussian mixture model optimization, reducing local optima trapping and enhancing stability.
Contribution
It presents the DQAEM algorithm that integrates quantum annealing into EM, providing theoretical stability and demonstrating improved performance in Gaussian mixture models.
Findings
DQAEM outperforms traditional EM in avoiding local optima.
Theoretical proof of stability for DQAEM.
Numerical simulations confirm efficiency improvements.
Abstract
We propose a modified expectation-maximization algorithm by introducing the concept of quantum annealing, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. The expectation-maximization (EM) algorithm is an established algorithm to compute maximum likelihood estimates and applied to many practical applications. However, it is known that EM heavily depends on initial values and its estimates are sometimes trapped by local optima. To solve such a problem, quantum annealing (QA) was proposed as a novel optimization approach motivated by quantum mechanics. By employing QA, we then formulate DQAEM and present a theorem that supports its stability. Finally, we demonstrate numerical simulations to confirm its efficiency.
| DQAEM | ||||
|---|---|---|---|---|
| Success | Fail | Total | ||
| Success | 55.9 % | 0.7 % | 56.6 % | |
| EM | Fail | 41.5 % | 1.9 % | 43.4 % |
| Total | 97.4 % | 2.6 % | 100.0 % | |
| DQAEM | EM | DSAEM |
| 97.4 % | 56.6 % | 77.8 % |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models*
Hideyuki Miyahara, Koji Tsumura, and Yuki Sughiyama *This work was supported in part by Grant-in-Aid for Scientific Research (B) (25289127), Japan Society for the Promotion of Science.H. Miyahara and K. Tsumura are with the Department of Information Physics and Computing, the Graduate School of Information Science and Technology, The University of Tokyo, Japan [email protected]. Sughiyama are with Institute of Industrial Science, The University of Tokyo, Japan
Abstract
We propose a modified expectation-maximization algorithm by introducing the concept of quantum annealing, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. The expectation-maximization (EM) algorithm is an established algorithm to compute maximum likelihood estimates and applied to many practical applications. However, it is known that EM heavily depends on initial values and its estimates are sometimes trapped by local optima. To solve such a problem, quantum annealing (QA) was proposed as a novel optimization approach motivated by quantum mechanics. By employing QA, we then formulate DQAEM and present a theorem that supports its stability. Finally, we demonstrate numerical simulations to confirm its efficiency.
I Introduction
Combinatorial optimization is a fundamental issue in both science and engineering. Although some problems in such optimization can be efficiently solved by well-known algorithms [1, 2], other problems in a class of NP-hard, e.g. the traveling salesman problem, are essentially difficult to solve.
One of the effective approaches for NP-hard problems is simulated annealing (SA), which was proposed by Kirkpatrick et al. [3, 4]. SA is a generic approach for optimization, in which random numbers that mimic thermal fluctuations are used to go over potential barriers in objective functions. Furthermore, its global convergence is in some sense guaranteed by Geman and Geman et al. [5]. After that, a quantum extension of SA, which is called quantum annealing (QA), was proposed in physics [6, 7, 8], and has been intensively studied [9, 10, 11, 12, 13, 14, 15, 16, 17]. In QA, instead of thermal fluctuations, quantum fluctuations are used to overcome potential barriers in objective functions, and it has been reported that QA is more effective than SA for some problems [12]. Especially, due to quantum fluctuations, QA exhibits better performance than SA when objective functions have steep multimodality.
Such combinatorial optimization also appears in machine learning, which has attracted much interest recently [18, 19]. For example, some class of data clustering is known to be NP-hard problems [20]. One of common methods for data clustering is as follows. Assuming data points are generated by Gaussian mixture models (GMMs), we estimate the parameters in GMMs by the expectation-maximization (EM) algorithm [21]. However, parameter estimation sometimes fails since EM depends on initial values and suffers from the problem of local optima. To relax the problem, Ueda and Nakano proposed a deterministic simulated annealing expectation-maximization (DSAEM) algorithm 111This algorithm is called the deterministic annealing expectation-maximization algorithm in Ref. [22]., and it succeeds to relax the difficulty of the multimodality in EM. This algorithm is based on deterministic simulated annealing (DSA) 222This algorithm is called deterministic annealing in Ref. [23]., which was proposed by Rose et al. [23, 24]. The essence of these approaches is to make objective functions smooth by introducing thermal fluctuations without random numbers, and the non-convex problem in optimization is considerably managed without increase of numerical cost.
As we have explained, QA is considered to be effective than SA in some conditions [12], and thus the quantum version of DSA is expected to be superior to it. In this paper, we propose a deterministic quantum annealing expectation-maximization (DQAEM) algorithm for Gaussian mixture models because it is expected that quantum fluctuations can relax the problem of local optima in parameter estimation. In our previous paper [25], we proposed DQAEM for continuous latent variables, and obtained the result that DQAEM outperformed EM. However, its applicability is limited because the latent variables are assumed to be continuous and most difficulties in parameter estimation come from optimization of discrete latent variables, such as Gaussian mixture models. Thus, in this paper, we develop DQAEM for discrete latent variables and apply it to GMMs. After the formulation of the algorithm, we present a theorem that guarantees its stability. Finally, to illustrate its efficiency compared to EM, we show numerical simulations, in which DQAEM is applied to GMMs for data clustering.
This paper is organized as follows. In Sec. II, we review GMMs and EM to prepare for DQAEM. In Sec. III, which is the main section of this paper, we describe the formulation of DQAEM in detail and present a theorem on its convergence. In Sec. IV, we demonstrate numerical simulations and discuss its efficiency. In Sec. V, we conclude this paper.
II Review of the expectation-maximization (EM) algorithm and Gaussian mixture models (GMMs)
In this section, we review EM to prepare for introducing our DQAEM, and consider an estimation problem of GMMs to formulate DQAEM because it is one of the simplest models with discrete variables.
II-A Maximum likelihood estimation (MLE) and the expectation-maximization (EM) algorithm
The aim of this subsection is to describe EM because DQAEM is based on it. First, we review maximum likelihood estimation (MLE) briefly. Suppose we have data points and they are independent and identically distributed obeying where is a parameter. Moreover we define as the probability density functions for complete data with the unobservable variables . Namely, , where represents the domain of . Then the log likelihood function is given by
[TABLE]
Note that in and is the index for each observed data point. MLE is a technique to estimate the parameter in model distributions that maximize the log likelihood function .
In general, maximizing the log likelihood function with respect to is difficult because it is sometimes a non-convex optimization, and then we replace it with its lower bound. Using Jensen’s inequality, we have the following inequality
[TABLE]
where is an arbitrary parameter and is the conditional probability. Then, the procedure of EM consists of the following two steps. The first one, which is called the E step, is to compute the conditional probability by
[TABLE]
Here we have used Bayes’ rule. The second one, which is called the M step, is to maximize the function (2) with respect to instead of . Denoting the tentative estimated parameter at the -th iteration by , the estimated parameter is updated by
[TABLE]
At the end of this subsection, we summarize EM in Algo. 1.
II-B Gaussian mixture models (GMMs)
Here we introduce GMMs and its quantum mechanical representation. We follow the notations in Refs. [18, 19]. Let and denote continuous observable and discrete unobservable variables. Here, we assume that , which is the domain of , is given by , where
[TABLE]
for , and then the number of elements in is . Specifically, when denotes the -th element in .
Using the above notation, the probability density function of GMMs is given by
[TABLE]
where
[TABLE]
satisfies , is a Gaussian function with mean and covariance for , and . The joint probability density function for GMMs is therefore given by
[TABLE]
where is the -th element of .
To introduce quantum fluctuations, we need to rewrite the above equations in the Hamiltonian formulation. Taking the logarithm of (4), the Hamiltonian for GMMs can then be written as
[TABLE]
where for . Here, we introduce ket vectors, bra vectors and “spin” operators to rewrite (5) in the manner of quantum mechanics. First, we define the ket vector by and the “spin” operator by
[TABLE]
respectively, where the bra vector satisfies the orthonormal condition . Replacing with , we have the Hamiltonian operator
[TABLE]
and this satisfies
[TABLE]
where is the Kronecker delta. We use this formulation to describe DQAEM in the following section. Note that a similar expression is presented in Ref. [26].
III Deterministic quantum annealing expectation-maximization algorithm (DQAEM)
First, we formulate DQAEM by using the quantum representation described in the previous section. Then we discuss its stability by showing the monotonicity of the free energy during the algorithm.
III-A Formulation
In this subsection, we formulate DQAEM by employing the concept of quantum annealing [8] (also see App. -A). First, we rewrite EM in the quantum representation. The log likelihood function (1) is rewritten as
[TABLE]
Note that . As we have explained in Sec. II-A, the function (2) is maximized in the M step of EM. Similarly to (7), the quantum representation of the function (2) is given by
[TABLE]
where
[TABLE]
and is in Eq. (6). Furthermore, the conditional probability is computed using Bayes’ rule. That is,
[TABLE]
Here, the normalization factor, which is called the partition function in physics, has the form
[TABLE]
Now we begin to formulate DQAEM. To introduce quantum fluctuations, we add whose satisfies for to the original Hamiltonian , and then (8) is converted to
[TABLE]
In MLE, the log likelihood function (7) is optimized. On the other hand, the objective function in DQAEM, which is called the free energy, is given by
[TABLE]
where
[TABLE]
By taking in into account and comparing to Eq. (7), we obtain the relation between the free energy and the log likelihood function as
[TABLE]
Thus we can say that the negative free energy at is the log likelihood function.
Next, we define the function to formulate DQAEM, which corresponds to the function in EM. Using (9), the function has the form
[TABLE]
where
[TABLE]
Then DQAEM is composed of the following two steps. The first one is to compute the conditional probability (12), and this is called the E step of DQAEM. The second one is to update the parameter by minimizing the function (11). That is,
[TABLE]
and this is called the M step in DQAEM. Furthermore, we decrease during the iterations. We summarize DQAEM in Algo. 2.
III-B Convergence theorem
We have proposed DQAEM in the previous subsection. Here, we present the theorem that guarantees its stability via iterations.
Theorem 1
Let . Then holds. Moreover, the equality holds if and only if and , where .
This theorem insists that DQAEM converges at least the global optimum or a local optimum. We mention that the global convergence of EM is discussed by Dempster et al. [21] and Wu [27], and their discussions apply to DQAEM.
IV Numerical simulations
In this section, we carry out numerical simulations to confirm the performance of DQAEM. In the first subsection, we present the setup of numerical simulations, and, in the following subsection, we provide numerical results.
IV-A Mathematical setup
We estimate the parameters of GMMs by using both DQAEM and EM. Suppose data points are identically sampled by GMMs with . Here, a GMM is given by (4). In EM, the updating equations for are determined by the derivative of the function (2) with respect to . The parameter of GMMs at the -th iteration is then given by
[TABLE]
where is the tentative estimated parameter at the -th iteration.
In DQAEM, the updating equations for are determined by the derivative of the function in (11) with respect to , and then in (13), (14) and (15) are replaced by . That is, the updating equations for DQAEM are given by
[TABLE]
Note that the quantum effects for parameter estimation comes from . The annealing parameter are varied from initial values to [math] via iterations.
In this section, assume that, in matrix notation, is given by
[TABLE]
Obviously is satisfied. Note that the size of the Hamiltonian is determined by assumed number of mixtures.
IV-B Numerical results
In this subsection, using the data set shown in Fig. 1(a), we compare DQAEM, EM, and DSAEM, which was proposed in Ref. [22]. This data set is generated by the GMM that consists of three two-dimensional Gaussian functions whose means are , and . Here we set in DQAEM to discuss the effect of quantum fluctuations simply. We also choose the annealing parameter in DSAEM as . Note that, in DSAEM, the annealing parameter is given by temperature. Furthermore, we exponentially vary and to 1 and 0, respectively. We plot transitions of the log likelihood functions of EM and the negative free energies of DSAEM and DQAEM in Fig. 1(b) by red lines, orange lines, and blue lines, respectively. The value of depicted by the green line in Fig. 1(b) is the optimal value in these numerical simulations. DQAEM, EM, and DSAEM give the optimal estimate or suboptimal estimates depending on initial optimization values.
To understand visually how DQAEM and EM behave in parameter estimation, we illustrate estimated Gaussian functions in the case where the log likelihood function is in Fig. 2(a) and in one of the cases where the log likelihood function is lower than the optimal value in Fig. 2(b). The case demonstrated in Fig. 2(b) clearly fails in data clustering.
However, the ratios of success for DQAEM, EM, and DSAEM are much different. To see the ratios of success and failure for DQAEM and EM, we performed DQAEM and EM with same initial optimization values times, respectively, and summarized the results in Table I.
Here, we have defined the “success” of DQAEM and EM when square errors between the estimated means of three Gaussian functions and the true means are less than times the covariances of three Gaussian functions. Table I shows that DQAEM succeeds with the ratio of % while EM succeeds with the ratio of %, and that DQAEM is superior to EM. In Table II, we show the ratios of success for DQAEM, EM, and DSAEM in parameter estimation. This table also shows that DQAEM is superior to DSAEM.
V Conclusion
In this paper, we have proposed the deterministic quantum annealing expectation-maximization (DQAEM) algorithm for Gaussian mixture models (GMMs) to relax the problem of local optima of the expectation-maximization (EM) algorithm by introducing the mechanism of quantum fluctuations into EM. Although we have limited our attention to GMMs in this paper to simplify the discussion, the derivation presented in this paper can be straightforwardly applied to any models which have discrete latent variables. After formulating DQAEM, we have presented the theorem that guarantees its convergence. We then have given numerical simulations to show its efficiency compared to EM and DSAEM. It is expect that the combination of DQAEM and DSAEM gives better performance than DQAEM. Finally, one of our future works is a Bayesian extension of this work. In other words, we are going to propose a deterministic quantum annealing variational Bayes inference.
-A Quantum annealing
Here, we briefly introduce “quantum” annealing (QA) to prepare for DQAEM. First we consider the minimization problem of the Ising model. That is,
[TABLE]
where
[TABLE]
for each , and is the coupling constant between spins at site and site . Note that this problem can describe many combinatorial problems such as the traveling salesman problem and the max-cut problem [28].
In QA, we quantize the Ising model (16) by applying magnetic fields along the axis to the model and solve the Schrödinger equation on this system while decreasing the magnetic fields. Then the Hamiltonian of this system is given by
[TABLE]
where
[TABLE]
using
[TABLE]
and represents the strength of the magnetic fields. This is called the Transverse Ising model. Thus the Schrödinger equation that we solve in QA is given by
[TABLE]
where is the imaginary unit, is the Dirac constant, and is the ket vector. The magnetic field is set to be large at the beginning of QA, and then is initially equal or close to the eigenstate of . During solving (17), we gradually decrease and finally make go to zero. Therefore, gives a solution for the original Hamiltonian (16). The efficiency of QA is discussed in Refs. [8, 11, 12].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. W. Dijkstra, “A note on two problems in connexion with graphs,” Numerische mathematik , vol. 1, no. 1, pp. 269–271, 1959.
- 2[2] J. B. Kruskal, “On the shortest spanning subtree of a graph and the traveling salesman problem,” Proceedings of the American Mathematical Society , vol. 7, no. 1, pp. 48–50, 1956. [Online]. Available: http://www.jstor.org/stable/2033241
- 3[3] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science , vol. 220, no. 4598, pp. 671–680, 1983. [Online]. Available: http://www.sciencemag.org/content/220/4598/671.abstract
- 4[4] S. Kirkpatrick, “Optimization by simulated annealing: Quantitative studies,” Journal of Statistical Physics , vol. 34, no. 5-6, pp. 975–986, 1984. [Online]. Available: http://dx.doi.org/10.1007/BF 01009452
- 5[5] S. Geman and D. Geman, “Stochastic relaxation, gibbs distributions, and the bayesian restoration of images,” Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol. PAMI-6, no. 6, pp. 721–741, Nov 1984.
- 6[6] B. Apolloni, C. Carvalho, and D. de Falco, “Quantum stochastic optimization,” Stochastic Processes and their Applications , vol. 33, no. 2, pp. 233–244, 1989. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0304414989900409
- 7[7] A. Finnila, M. Gomez, C. Sebenik, C. Stenson, and J. Doll, “Quantum annealing: A new method for minimizing multidimensional functions,” Chemical Physics Letters , vol. 219, no. 5–6, pp. 343–348, 1994. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0009261494001170
- 8[8] T. Kadowaki and H. Nishimori, “Quantum annealing in the transverse ising model,” Phys. Rev. E , vol. 58, pp. 5355–5363, Nov 1998. [Online]. Available: http://link.aps.org/doi/10.1103/Phys Rev E.58.5355
