Newton correction methods for computing real eigenpairs of symmetric tensors
Ariel Jaffe, Roi Weiss, and Boaz Nadler

TL;DR
This paper introduces a Newton-based iterative method for efficiently computing real eigenpairs of symmetric tensors, demonstrating quadratic convergence and superior performance over existing methods.
Contribution
It proposes a novel Newton correction method with proven convergence properties and empirical advantages for finding real eigenpairs of symmetric tensors.
Findings
Method converges quadratically near eigenpairs.
Finds more eigenpairs than previous methods.
Typically finds all real eigenpairs with multiple initializations.
Abstract
Real eigenpairs of symmetric tensors play an important role in multiple applications. In this paper we propose and analyze a fast iterative Newton-based method to compute real eigenpairs of symmetric tensors. We derive sufficient conditions for a real eigenpair to be a stable fixed point for our method, and prove that given a sufficiently close initial guess, the convergence rate is quadratic. Empirically, our method converges to a significantly larger number of eigenpairs compared to previously proposed iterative methods, and with enough random initializations typically finds all real eigenpairs. In particular, for a generic symmetric tensor, the sufficient conditions for local convergence of our Newton-based method hold simultaneously for all its real eigenpairs.
| full rank | rank deficient | ||
|---|---|---|---|
| Isolated | Quadratic convergence | Slow convergence | No guarantees |
| Non-isolated | — | No guarantees | No guarantees |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\NewEnviron
salign
[TABLE]
Newton correction methods for computing real eigenpairs of symmetric tensors
Ariel Jaffe
Weizmann Institute of Science, Israel
Roi Weiss
Weizmann Institute of Science, Israel
Boaz Nadler
Weizmann Institute of Science, Israel
Abstract
Real eigenpairs of symmetric tensors play an important role in multiple applications. In this paper we propose and analyze a fast iterative Newton-based method to compute real eigenpairs of symmetric tensors. We derive sufficient conditions for a real eigenpair to be a stable fixed point for our method, and prove that given a sufficiently close initial guess, the convergence rate is quadratic. Empirically, our method converges to a significantly larger number of eigenpairs compared to previously proposed iterative methods, and with enough random initializations typically finds all real eigenpairs. In particular, for a generic symmetric tensor, the sufficient conditions for local convergence of our Newton-based method hold simultaneously for all its real eigenpairs.
Key words. tensor eigenvectors; tensor eigenvalues; symmetric tensor; higher-order power method; Newton-based methods; Newton correction method.
Introduction
Eigenpairs of symmetric tensors have received much attention in recent years due to their applicability in a wide range of disciplines. Introduced by Lim [21] and Qi [26], tensor eigenpairs were used for example in the analysis of hypergraphs [20], high-order Markov chains [24], establishing the positive-definiteness of multivariate forms [25], diffusion tensor imaging [28, 27], and data analysis [2, 1].
The focus of this paper is on fast iterative methods to compute the real eigenpairs of symmetric tensors. These methods were recently applied by the authors and collaborators in [16] for learning a binary latent variable model by computing the eigenpairs of a third order moment tensor of the observed data.
There are major differences between tensor eigenpairs, whose formal definition is reviewed in Section 2, and their well studied matrix counterparts. Whereas any symmetric matrix has exactly real eigenvalues with corresponding orthogonal eigenvectors, the situation for tensors is fundamentally different. The eigenvectors of a symmetric tensor are not necessarily orthogonal and some may in fact be complex-valued. Furthermore, the number of eigenvalues is in general significantly larger than . For symmetric real tensors of order and dimensionality , [5] proved that the number of complex eigenvalues is at most . For generic tensors, this is the exact number of complex eigenvalues. As a lower bound, it is known that for odd-order tensors at least one real eigenvalue exists [5], while for symmetric even-ordered tensors at least real eigenvalues exist [6]. Recently, [4] analyzed the expected number of real eigenvalues of a random Gaussian symmetric tensor.
From a computational perspective, while all matrix eigenpairs can be computed efficiently, Hillar and Lim [15] showed that enumerating all eigenpairs of a symmetric tensor is in general #P. Nonetheless, Cui et al. [8] derived a method to compute all eigenpairs sequentially, based on a hierarchy of semidefinite relaxations. Chen et al. [7] proposed a homotopy continuation method for the same purpose. While these algorithms are guaranteed to find all isolated eigenpairs, they are computationally demanding even for moderate tensor dimensions. For example, computing all real eigenpairs of a random tensor using the zeig procedure in the TenEig package of [7] takes several hours on a standard PC.
In recent years, several iterative methods were developed to compute at least some tensor eigenpairs. Some methods were specifically designed to compute the largest eigenvalue [24, 22, 14, 11]. Han [13] proposed a method based on unconstrained optimization to compute both the maximal and minimal eigenvalues of an even-order tensor. As described in Section 3, adaptations of the popular power method to tensors were suggested in [9, 18, 19, 31]. While these iterative methods are computationally fast, in general they converge to only a strict subset of all eigenpairs.
In this work we present a different iterative approach to compute real eigenpairs of symmetric tensors. As detailed in Section 4, our approach is based on adapting the matrix Newton correction method (NCM) to the tensor case. We derive sufficient conditions for local convergence of NCM and prove that its convergence rate is quadratic. Our analysis reveals that NCM may fail to converge to eigenvectors with eigenvalue zero and has small attraction region for eigenvalues close to zero. To overcome this limitation, we next derive a variant, denoted the orthogonal Newton correction method (O–NCM), which enjoys improved run-time and convergence guarantees. We observe that for a generic symmetric tensor, the sufficient conditions for either NCM or O–NCM to converge to all its eigenpairs hold with probability one. In Section 5 we illustrate that these sufficient conditions are not necessary.
In Section 6 we present numerical simulations that support our theoretical analysis. For random tensors of modest size, multiple random initializations of NCM or O–NCM can find all eigenpairs significantly faster than other methods. For example, on a random tensor, our methods typically found all eigenpairs within a few seconds. We conclude with a summary and discussion in Section 7.
Notation
We denote vectors by lowercase boldface letters, as in , matrices by uppercase letters, as in , and higher-order tensors by caligraphic letters, as in . We denote . is the identity matrix whose dimension depends on the context and is the unit sphere.
The symmetric tensor eigen-problem
Let be a tensor of order and dimension , with entries , where . We assume that is symmetric, namely, for all permutations of the indices . A tensor can be viewed as a multi-linear operator: for matrices with , the tensor-mode product, denoted by , yields a new tensor whose entry is
[TABLE]
Tensor eigenpairs
Several definitions of tensor eigenpairs appear in the literature. Here we consider the one introduced as -eigenpairs in [26] and -eigenpairs in [21].
Definition 1
A pair is a real eigenpair of if
[TABLE]
Note that if is an eigenpair, then is an eigenpair as well. Following common practice, we treat these two pairs as belonging to the same equivalence class [5].
Definition 1 can be equivalently stated using the following -degree homogeneous polynomial in ,
[TABLE]
As shown in [21], the real eigenpairs of correspond to the critical points of when constrained to the unit sphere . Formally, define the Lagrangian
[TABLE]
A constrained critical point of satisfies the Karush-Kuhn-Tucker conditions,
[TABLE]
where is such that . This is precisely Equation (2). For future use, we denote the gradient of at an arbitrary point by
[TABLE]
Similarly, we denote the Hessian matrix by
[TABLE]
As will become clear in Section 4, the spectral structure of plays a fundamental role in the convergence of our proposed Newton-based methods to .
Power methods for computing tensor eigenpairs
To motivate our approach, it is first instructive to briefly review previous iterative methods, specifically the symmetric higher-order power method (HOPM) [9] and the shifted-HOPM [18, 19]. The HOPM was derived as a way to compute the best rank-1 approximation of a symmetric tensor under the squared error loss,
[TABLE]
Although the above problem is non-convex and has no closed form solution, it was shown in [9] that it is equivalent to finding the vector with that maximizes the objective function in (3). To compute , the following generalization of the matrix power method to high-order tensors was proposed. Starting from a (random) initial guess , HOPM iterates
[TABLE]
It was shown in [17, Theorem 4] that for an even-order tensor, if its associated function is convex or concave, then HOPM is guaranteed to converge to a local optimum of in . For general symmetric tensors, however, HOPM has no convergence guarantees, and may indeed fail to converge, see [17] for a specific example.
To overcome the limitations of HOPM, [18, 19] proposed the shifted function
[TABLE]
Since on the unit sphere , the critical points of and are identical. Instead of (8), the shifted-HOPM iterates
[TABLE]
Importantly, the value of can be tuned so that from any starting point , the shifted-HOPM is guaranteed to converge to a critical point of . [19] further devised an adaptive shifted-HOPM, whereby the value of is updated at each iteration so that is locally convex or concave around . This avoids the possible slowdown of the shifted-HOPM with a fixed value of , while maintaining its convergence guarantees.
Convergence, attraction regions, and stable eigenpairs
The adaptive shifted-HOPM converges only to some eigenpairs of a tensor. These may be characterized as follows. For any , let be a matrix with orthonormal columns that span the subspace orthogonal to . Define the projected Hessian matrix,
[TABLE]
where is the Hessian matrix in (6). In [18], an eigenvector was termed positive-stable if is positive-definite and negative-stable if is negative-definite. Otherwise, is termed unstable. [18] showed that the shifted HOPM does not converge to unstable eigenvectors but does converge to the stable ones. Further, the convergence is at a linear rate. To distinguish between eigenpairs that are stable for the (adaptive) shifted-HOPM and those that are stable for the Newton-based methods, we henceforth refer to the above as power-stable eigenpairs, power-unstable eigenpairs, etc.
As an example, the left panel of Figure 1 shows the value of over the unit sphere for a symmetric tensor with real eigenvectors. Three eigenvectors, depicted in red, are power-unstable, while the remaining four, depicted in black, are power-stable. The right panel of the same figure shows the results of the adaptive shifted-HOPM. The color indicates the eigenvector to which the method converged, starting from various locations on the unit sphere. The figure shows clear convergence regions around of the power-stable eigenpairs. The region around the fourth power-stable eigenpair appears on the back side of the sphere. In agreement with theory, the adaptive HOPM did not converge to any of the three power-unstable eigenpairs.
Newton-based methods for the tensor eigen-problem
Given the limitations of the aforementioned methods, our goal is to derive a fast iterative algorithm that under mild assumptions is able to converge to all real eigenpairs of a symmetric tensor. To this end, we develop a Newton-based method.
4.1 Newton correction method
Several variants of Newton’s method were derived for the symmetric matrix eigen-problem, see for example [30, Chapter 6]. Here we derive a Newton-based method for computing the eigenpairs of symmetric tensors. Recently, [12] considered a similar approach for finding some nonnegative eigenpairs of a nonnegative tensor.
Let be an eigenpair of a symmetric tensor of order and dimensionality . Given an approximation to , our goal is to obtain an improved approximation (see Figure 3, left). Denote the exact unknown correction by and recall . Since and , the eigen-problem in (2) can be written as
[TABLE]
Recalling the Hessian matrix in (6), we define the matrix by
[TABLE]
Setting apart the terms that are linear in , Eq. (11) takes the form
[TABLE]
where is given in (5). Here, accounts for all high order terms in ,
[TABLE]
By definition, the solution to (13) satisfies . However, solving (13) exactly for is as difficult as finding the eigenpair ) of the tensor we started from. Instead, we devise an iterative Newton correction method (NCM) that solves (13) only approximately. Given the approximation of at the iteration, NCM computes a new approximation by neglecting the high order terms in (13). This amounts to solving the system of linear equations
[TABLE]
Assuming is invertible, the unique solution to (15) is given by
[TABLE]
The new approximation for is then
[TABLE]
Given an initial guess , NCM iterates steps (16) and (17). Once a stopping condition is met, the pair is returned; see Algorithm 1.
The left panel of Figure 2 shows the convergence regions of NCM for the eigenpairs of the same tensor as in Figure 1. In this case, all eigenpairs are attracting points of NCM and can thus be found by running Algorithm 1 multiple times with different (random) initial guesses.
Convergence guarantees
Two questions regarding NCM are (i) to which eigenpairs of the method can converge to? and (ii) what is the convergence rate? To answer these questions we recall the definition of the Hessian matrix in (6). Given an eigenpair , its corresponding Hessian matrix is
[TABLE]
Note that is symmetric and has as an eigenvector with eigenvalue . Denote by the other eigenvalues of . By definition, these are the eigenvalues of the projected Hessian in (10).
Definition 2
For , an eigenpair is -Newton-stable if all eigenvalues of in absolute value are at least , namely, .
Note that is full rank iff is -Newton-stable for some . Similarly, is full rank iff is -Newton-stable for some and . We have the following convergence guarantee for NCM. The proof is given in Appendix A.
Theorem 1
Let be an eigenpair of a symmetric tensor . Suppose that is -Newton-stable and that . Then there exists an such that for any that satisfies , the sequence , computed by Algorithm 1 converges to at a quadratic rate.
4.2 Orthogonal Newton correction method
As discussed above, the NCM method may not converge to an eigenvector with eigenvalue . To remove this limitation, we now develop an orthogonal NCM variant. Given an approximation of , we first decompose it into its projection onto and a residual (see Figure 3, right),
[TABLE]
where and is the residual. Since and are orthogonal and , . For reasons to become clear shortly, we also introduce a correction to the eigenvalue , defined as
[TABLE]
When , we have , and . Since is an eigenpair,
[TABLE]
Inserting and into the above equation gives
[TABLE]
We set apart the terms that are linear in and to obtain
[TABLE]
where and were defined in (5) and (6) respectively and includes all the remaining higher order terms in ,
[TABLE]
Combining (21) with the orthogonality condition, , gives the following set of non-linear equations in ,
[TABLE]
By construction, the solution to (23) satisfies . Similarly to the NCM, we neglect the high order terms in the right hand side of (23) and solve the system of linear equations in the unknowns ,
[TABLE]
Due to the extra variable , (24) seems to be of dimension , as opposed to the dimensional system in (15). However, as we now show, the system in (24) can be equivalently solved in the dimensional subspace orthogonal to . More precisely, let be the projection matrix into the subspace orthogonal to and let have orthonormal columns such that . Recall the projected Hessian matrix in (10). The following lemma is an adaptation of [30, Theorem 6.2.2] to our setting. Its proof is given in Appendix B.
Lemma 1
A vector satisfies (24) if and only if satisfies
[TABLE]
Assuming is invertible, the solution to (25) is So given the approximation to , O–NCM computes
[TABLE]
and the new approximation is . Given an initial , O–NCM iterates these steps until a stopping condition is met; see Algorithm 2.
The right panel of Figure 2 shows the convergence regions of O–NCM for the various eigenpairs of the same tensor in Figure 1. Similarly to NCM, in this case, all eigenpairs are attracting points of O–NCM, but with slightly different regions.
Convergence guarantees
We have the following convergence guarantee for O–NCM. It is similar to that of NCM in Theorem 1, but with the condition removed. The proof is given in Appendix C.
Theorem 2
Let be a -Newton-stable eigenpair of a symmetric tensor . There exists an such that for any that satisfies , the sequence , computed by Algorithm 2 converges to at a quadratic rate.
Remark 1
Recall that any eigenpair to which the shifted-HOPM method converges to has a Hessian matrix which is either positive definite or negative definite. In either case, this Hessian matrix has full rank, and thus by Theorem 2, is a stable fixed point of O–NCM. In other words, O–NCM typically converges to many more tensor eigenpairs than the shifted power method. However, the adaptive shifted-HOPM is guaranteed to converge from any initial point, whereas no such global convergence guarantee is currently available for the Newton-based methods.
Convergence to eigenpairs with a rank deficient Hessian
The sufficient condition in Theorem 2 for O–NCM to converge to an eigenpair rests on the smallest absolute eigenvalue of the projected Hessian matrix . In particular, if the eigenpair is -Newton-stable for some , an attraction neighborhood around exists. We now illustrate that this sufficient condition for convergence is by no means necessary. To this end, we analyze a simple example. Denote . For any , define the following cubic -dimensional symmetric tensor,
[TABLE]
When , is orthogonal, having the maximally possible number of real eigenpairs. All these are Newton-stable and among them are the power-stable eigenpairs . Assume is odd and denote . Let be the number of real eigenpairs of . The following proposition is proved in Appendix D.
Proposition 1
Define thresholds, , .
- (i)
The number of real eigenpairs of of Eq. (27) is
[TABLE]
- (ii)
For , out of the real eigenpairs of are not Newton-stable.
We illustrate the above properties for with . In this case, and there are two thresholds, and . Figure 4 shows the number of real eigenpairs (left), and the different eigenvalues of (right) as computed by O–NCM. As expected, at and , the number of real eigenvalues decreases.
Next, we examine the convergence of O–NCM on with . According to Proposition 1, , and the number of real eigenpairs is , three of which are not Newton-stable. Figure 5 shows the attraction regions around two of the eigevectors of , as well as the full unit sphere. On the left, the eigenvector is Newton-stable. As expected, O–NCM converged to this eigenvector from any point in its neighborhood. In contrast, the eigenvector on the right is not Newton-stable. In this case, there is a positive probability of converging to a different eigenvector even when the initial guess is arbitrary close. Nonetheless, O–NCM converged to this eigenvector from some directions, even though the sufficient condition in Theorem 2 does not hold.
In this example, all eigenpairs of are isolated, namely each one of them is the unique eigenpair in a small neighborhood around it. In addition, the eigenvectors which are not Newton-stable have a projected Hessian matrix that is rank deficient but non-zero. In Appendix E, we illustrate the behavior of O–NCM near eigenvectors that are either non-isolated, or have a projected Hessian matrix equals to zero. In these cases, O–NCM may not converge.
Simulation results
In this section we study numerically the performance of NCM and O–NCM for computing the real eigenpairs of symmetric tensors, as compared to the homotopy method [7] and the adaptive shifted-HOPM [19].111Matlab code for the NCM methods can be found at https://github.com/arJaffe/NCM, for the homotopy method at http://users.math.msu.edu/users/chenlipi/TenEig.html, and for the shifted HOPM at http://www.sandia.gov/~tgkolda/TensorToolbox/index-2.6.html. In the experiments we consider random Gaussian symmetric tensors, whose entries are all i.i.d. up to the symmetry constraints. All experiments were done on a PC with an Intel i-3820 processor, RAM and MATLAB version R2016a.
Finding all real eigenpairs
We examine the time needed to compute all eigenpairs for random tensors of order and various dimensions . Similar results are obtained for other values of . For each tensor, we first ran the homotopy method to obtain all its eigenpairs. Next, we ran NCM and O–NCM, initialized repeatedly with random points on the unit sphere, until all eigenpairs were found. Note that without running the homotopy method first, we would have no criterion to decide whether we actually found all tensor’s eigenpairs. The process was sequential, where a new run was initialized only after the previous one ended. This process, however, can be easily parallelized. We stopped the NCM iterations when or if a maximal number of iterations was reached. In the latter case we declared that NCM failed to converge. For both NCM and O–NCM, and for all tensor dimensions we considered, only of all random initializations failed to converge within the maximal number of iterations.
Figure 6 (left) shows the number of real eigenpairs as averaged over 10 independent tensors for each value of . Figure 6 (right) shows on a logarithmic scale the average time it took to compute all real eigenpairs via the homotopy, NCM and O–NCM, for the same tensors. These results show that both NCM and O–NCM recovered all eigenpairs faster than the homotopy method by approximately two orders of magnitude. Moreover, O–NCM did so much faster than NCM.
Small eigenvalues
To understand the gap in the runtime of NCM and O–NCM shown in Figure 6, we next examine the dependence of both methods on the eigenpair to which they converge. As suggested by Theorems 1 and 2, we expect O–NCM to have larger attraction regions for small eigenvalues as compared to NCM. Figure 7 (left) shows on a log-log scale the relative number of times the two methods converged to each eigenvalue as a function of its absolute value for a typical random tensor of order and dimension . These counts correspond to a total of random initializations uniformly distributed on the unit sphere. As one can see, the probability for NCM to converge to an eigenpair decreases sharply when its eigenvalue becomes small, while for O–NCM this probability seems to be independent of . This difference is the source of the gap in the runtime of the two methods for finding all eigenpairs. For completeness, the eigenvalues found by the adaptive S-HOPM are also presented.
Convergence rates
Figure 7 (right) shows the median runtime till convergence of the NCM and the shifted HOPM. The stopping condition for all methods was set to . The experiment was done on random tensors of fourth order with various dimensions. For each tensor, we initialized all methods with random starting points. To avoid the influence of any particular implementation, we normalized the results with the runtime of both methods for . As illustrated in Fig. 7, the runtime increase of the NCM or O–NCM is significantly slower than the corresponding increase in the adaptive shifted-HOPM. However, each NCM/O–NCM iteration may be slower, as it requires matrix inversion.
Discussion and summary
In this paper we developed and analyzed a Newton-based iterative approach to compute real eigenpairs of symmetric tensors. We now briefly discuss three important issues: its runtime, its ability to find all tensor eigenpairs, and its optimization point of view.
Runtime
The computational complexity of each NCM or O–NCM iteration is dominated by two operations: computing the Hessian matrix in time and solving a system of linear equations in time. The latter step may be significantly sped up by applying various preconditioning techniques, as done in other iterative methods that solve systems of linear equations [10]. For sparse tensors, the computation of the Hessian can be accelerated as well, see [29].
Optimization point of view
Following a constructive comment by one of the referees, we note that NCM can be seen as an adaptation of the Gauss–Newton method [3]. Recall that if and only if is an eigenvector of , with a corresponding eigenvalue . Hence, our goal is to find the global minima of the realizable nonlinear least-squares problem
[TABLE]
Given the current estimate , Gauss–Newton first linearizes at ,
[TABLE]
where is the Jacobian matrix of , given in (12). Then, instead of (28), the following approximate linear least-squares problem is solved,
[TABLE]
which is exactly the NCM correction in (15).
Besides NCM, other nonlinear optimization methods can be used to solve (28), such as the Levenberg-Marquardt algorithm [23] and other trust-region and line search algorithms. These methods, among other things, introduce an additional regularization term to better control the direction in which the method proceeds at each iteration, similarly to the role played by the (adaptive) shifted-HOPM as compared to HOPM. Specifically, instead of (15), one solves the following linear system with an appropriate regularization matrix ,
[TABLE]
While NCM currently has no global convergence guarantees, an appropriate (adaptive) choice of can lead to global convergence guarantees, including to eigenvectors having a zero Hessian. Further studying the role of regularization for the tensor eigen-problem is an interesting direction for future research.
However, in addition to the global minima, may have local minima which should be avoided. Interestingly, NCM elegantly avoids such local minima as the following example illustrates. In Figure 8 (left), we plot as a function of for the symmetric tensor of Example 1 in [17]. Its eigenvectors are depicted by circles while a local minimum of is depicted by a square symbol. In Figure 8 (right) we show the attraction regions for NCM starting from various locations on . As one can see, NCM does not converge to and in fact is highly unstable around this point; close initial points in this neighborhood may converge to arbitrarily far eigenvectors.
To see why this is so, note that since is a local minimum, for an initial point near , NCM may get closer and closer to at the first few iterations. However, the facts that is bounded away from and implies that is in the null space of . As gets closer to , becomes close to singular. The result is an overshoot, a sharp increase in , taking far away from and .
Finding all eigenpairs of generic tensors
According to our theoretical analysis, NCM and O–NCM converge to eigenpairs whose Hessian matrix is full rank. An interesting question is whether these methods can thus converge to all real eigenpairs of a generic symmetric tensor [5]. Interpreting generic in the sense of algebraic geometry, an adaptation of [5, Theorem 1.2] to the symmetric tensor case implies the following (proof omitted).
Proposition 2
All real eigenpairs of a generic symmetric tensor are Newton-stable.
Hence, Theorems 1 and 2 imply that NCM and O–NCM are guaranteed to find all eigenpairs of a generic symmetric tensor given a sufficiently large number of random initializations.
Acknowledgments
We thank Lek–Heng Lim, Meirav Galun and Haim Avron for interesting discussions.
Appendix A Convergence of NCM
To prove Theorem 1 we shall make use of the following auxiliary lemma.
Lemma 2
Consider one update step of Algorithm 1, as in Equation (17), starting from an initial and ending with . Let . If , then
[TABLE]
Proof 1
By definition,
[TABLE]
Since , by the triangle inequality,
[TABLE]
Applying the triangle inequality to (30), combined with the assumption ,
[TABLE]
hence concluding the proof.
Proof 2** (Proof of Theorem 1)**
To prove quadratic convergence it suffices to show that there exists an and a constant such that from any initial point that satisfies ,
[TABLE]
*We start by analyzing at the first iteration . Let be the approximate correction of , given by the solution of (15). The new approximation of , given by Eq. (17), is . Assume for the moment that the initial guess is sufficiently close to so that . Then, by Lemma 2, *
[TABLE]
Hence, it suffices to bound . To this end, we view the exact system of non-linear equations (13), whose solution is , as a perturbation of the approximate system of linear equations (15), whose solution is . Consider the matrix of Eq. (12) evaluated at the eigenvector ,
[TABLE]
Note that is symmetric with eigenvalues . Since is -Newton-stable, for all . In addition, since by assumption, is full rank with smallest singular value
[TABLE]
By the continuity of in , there exists a such that for all with . In particular, if , then is invertible and the solution to (13) satisfies the following implicit equation in ,
[TABLE]
Similarly, the unique solution to the correction equation (15) is as in (16),
[TABLE]
Subtracting the last two equations gives
[TABLE]
To bound , first note that for any symmetric tensor there exists an such that for any , and ,
[TABLE]
Similar bounds hold for and according to their powers in . Bounding each term of in (14) separately by (34), there are less than terms involving and at most terms involving with . Assuming , implies
[TABLE]
Inserting this bound into (33),
[TABLE]
Note that if , then as required by Lemma 2. Under this condition, by Eq. (31), it follows that
[TABLE]
As an interim summary, if , then Eq. (36) holds. We conclude the proof for a general iteration by induction. For the first induction step to work, it required that if , then as well. By (36), this is satisfied for and the proof for a general holds similarly. The quadratic convergence of Algorithm 1 follows.
Appendix B Proof of Lemma 1
We show that a vector satisfies (24) if and only if it satisfies
[TABLE]
Lemma 1 then follows by recalling that and multiplying the first equation in (37) by from the left.
To prove the first direction, note that by the last row of (24), the solution to (24) is perpendicular to , so and . Multiplying the first “row” of (24) by from the left and noting that , we find that the left hand side is given by
[TABLE]
In addition, one can easily check that is perpendicular to , so the right hand side of the equality in (37) is and (37) follows.
To prove the other direction, suppose satisfies (37). So and . Define and write the left hand side of (37) as
[TABLE]
Since , it follows that satisfies (24) as required.
Appendix C Convergence of O–NCM
The proof of Theorem 2 is similar to that of Theorem 1, and makes use of the following auxiliary lemma.
Lemma 3
Consider one update step of Algorithm 2, as in Equation (26), starting from an initial and ending with . Let and . If and , then
[TABLE]
Proof 3
By definition,
[TABLE]
Since and , by the triangle inequality,
[TABLE]
Applying the triangle inequality to (40), combined with the assumption ,
[TABLE]
hence concluding the proof.
Proof 4** (Proof of Theorem 2)**
We show that there exists an and a constant , such that for any initial point that satisfies ,
[TABLE]
We start by analyzing at the first iteration . Let be the approximate correction of , given by the solution of (25). The new approximation of is . Since is orthogonal to , the denominator of satisfies
[TABLE]
*To bound the numerator of , assume for the moment that is sufficiently close to so that and . Then, by Lemma 3, *
[TABLE]
Hence, it suffices to bound . Define and note that since and are orthogonal, . Writing , we thus have
[TABLE]
Since ,
[TABLE]
To bound , we view the exact system of non-linear equations (23), whose solution is , as a perturbation of the approximate system of linear equations (25), whose solution is . By (23), solves the non-linear equation
[TABLE]
We multiply (46) by from the left and plugin (44) to obtain the set of non-linear equations in (and ),
[TABLE]
Since is -Newton-stable, the projected Hessian is full rank with smallest singular value
[TABLE]
By the continuity of in , there exists an such that for all with . In particular, if , then is invertible and the solution to (47) satisfies the following implicit equation in ,
[TABLE]
Similarly, the unique solution to (25) is
[TABLE]
Subtracting the last two equations gives
[TABLE]
We bound the norm of in (22) by
[TABLE]
To bound the terms in the sum, note that there exists an such that for any , and ,
[TABLE]
Bounding each term in the sum in (49) by (50), there are at most terms involving , and at most terms involving with . Assuming and recalling that by (42), ,
[TABLE]
For the first term in (49), recalling the definition of in (19),
[TABLE]
For the first term in (51), note that , and , hence
[TABLE]
Since is an eigenvector and , all terms in the sum in (51) with vanish,
[TABLE]
Bounding each term in the sum in (51) with by (50), there are at most terms involving , and at most terms involving with . Since , the first term in (49) is thus bounded by
[TABLE]
It follows that
[TABLE]
Inserting this bound into (48),
[TABLE]
Inserting this into (45),
[TABLE]
Note that if , then . By (42), as well. Thus, (52) implies as required by Lemma 3. Under this condition, (43) implies
[TABLE]
*Combining the last two bounds we obtain *
[TABLE]
*As an interim summary, if , then (53) holds. The rest of the proof follows by induction as in the proof of Theorem 1. *
Appendix D Proof of Proposition 1
First, we prove an auxiliary lemma concerning the structure of the eigenvectors of . Recall . For any subset define the following two -dimensional vectors,
[TABLE]
Lemma 4
There is a function such that all eigenvectors of are of the form
[TABLE]
Proof 5
Let be an eigenvector of with eigenvalue . To prove the lemma it suffices to show that the coefficients can attain at most two distinct values. Applying mode product to with ,
[TABLE]
where . Since is an eigenpair it satisfies,
[TABLE]
Multiplying both sides of (55) with gives,
[TABLE]
Subtracting Equations (56) with ,
[TABLE]
*We thus conclude that for any either or . It follows that the set contains up to distinct values satisfying (56). *
The first part of Proposition 1 determines the number of real eigenvectors for . Following lemma 4, let be proportional to some eigenvector of . By Eq. (54),
[TABLE]
Since is proportional to an eigenvector of ,
[TABLE]
or equivalently,
[TABLE]
One solution to Eq. (57) is , which corresponds to the eigenvector . For we replace with,
[TABLE]
The result is the following quadratic equation,
[TABLE]
The solutions to Eq. (58) determine, up to a normalizing factor, the eigenvectors (both real and complex) of . Due to the problem’s symmetry we may charaterize all real eigenvectors by computing the solutions to (58) only for subsets with . Consider the discriminant of the quadratic equation (58),
[TABLE]
For a given , the number of real solutions to (58) is
[TABLE]
Hence, the number of real eigenpairs decreases at specific thresholds. The smallest threshold corresponds to and is given by . When , there are real solutions to Eq. (58) for all subsets . So the total number of solutions is equal to times the number of distinct subsets,
[TABLE]
where we add one to account for , corresponding to . Note that this is also the bound on the number of eigenvectors of a generic cubic tensor, see [5]. When , . In this case is composed of two eigenvectors for all subsets and one eigenvector for each subset of size ,
[TABLE]
For , there are no real solutions of Eq. (58) for subsets of size . The number of real solutions is therefore,
[TABLE]
Repeating the argument for increasing values of we obtain as given in the proposition’s statement.
We now prove the second part of the proposition, stating that at the thresholds , of the eigenvectors are not Newton-stable. In this case and only one (real) solution to (58) exists for each with . Solving (58) for , we find that the eigenpairs with are
[TABLE]
We show that each such eigenpair is not Newton-stable. To do so, we prove that the projected Hessian is rank deficient. First we compute the Hessian . Abbreviate and note that . Then,
[TABLE]
Consider the vector . A simple calculation yields . Since is orthogonal to , is rank deficient and is not Newton-stable.
Appendix E Convergence to eigenvectors which are not Newton-stable
In this section we present a detailed empirical study of the convergence properties of O–NCM. As discussed in Section 4, the main property that governs the convergence of O–NCM to an eigenpair is the spectral structure of the projected Hessian at the eigenvector, . As shown in Theorem 2, when is full rank, O–NCM converges in a quadratic rate to given a sufficiently close initial point. When is isolated but , the convergence rate may be less than quadratic. When and/or is non-isolated, full convergence to is not always observed. These properties are summarized in table 1.
We illustrate these convergence properties via two examples.
- (a)
Consider the tensor with order and dimensionality of Example in [5], corresponding to the homogeneous polynomial
[TABLE]
This tensor has a total of real eigenpairs. Six of them correspond to a eigenvalue, two of which are not Newton-stable with . The rest are Newton-stable. Figure 9 shows the value of as a function of the iteration for one eigenvector that is Newton stable and one that is not. While the convergence to the stable eigenvector is quadratic, the convergence to the point which is not Newton-stable point is much slower.
- (b)
Consider the tensor of example in [20], corresponding to the homogeneous polynomial
[TABLE]
There are a total of isolated eigenvectors, including one that corresponds to an eigenvalue . The projected Hessian for this vector is equal to a zero matrix. As can be seen in Figure 9, in this case the O-NCM does not fully converge.
In addition, there are also infinitely many non-isolated eigenvectors corresponding to an eigenvalue . The projected Hessian of these eigenpairs is a rank deficient (though non zero) matrix. For example, any vector of the form
[TABLE]
is proportional to a non-isolated eigenvector. Note that the vectors corresponding to (61) form a 2 dimensional subspace. Since these vectors are non-isolated, in this case we measure instead of where is the projection matrix onto that subspace. As can be seen in Fig. 9 in this case the O-NCM does not converge.
Trivial eigenvectors
In some cases, the tensor fibers are spanned by a low dimension subspace. Any vector orthogonal to this subspace is an eigenvector corresponding to an eigenvalue , and a projected Hessian equal to a zero matrix. This is the case, for instance in example (b) where all fibers are orthogonal to . As we have seen, this may cause the O-NCM to slowdown, since the iterations do not converge to these points.
A simple pre-processing step is to find these eigenvectors by calculating the subspace of the tensor fibers, namely . As a second step, the O-NCM can easily be constrained to that subspace.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. , 15:2773–2832, 2014.
- 2[2] Animashree Anandkumar, Daniel Hsu, and Sham M Kakade. A method of moments for mixture models and hidden markov models. In Conference on Learning Theory , pages 33–1, 2012.
- 3[3] Åke Björck. Numerical methods for least squares problems . SIAM, 1996.
- 4[4] Paul Breiding. The average number of critical rank-one-approximations to a symmetric tensor. ar Xiv preprint ar Xiv:1701.07312 , 2017.
- 5[5] Dustin Cartwright and Bernd Sturmfels. The number of eigenvalues of a tensor. Linear Algebra Appl. , 438(2):942–952, 2013.
- 6[6] K. C. Chang, Kelly Pearson, and Tan Zhang. On eigenvalue problems of real symmetric tensors. J. Math. Anal. Appl. , 350(1):416–422, 2009.
- 7[7] Liping Chen, Lixing Han, and Liangmin Zhou. Computing tensor eigenvalues via homotopy methods. SIAM J. Matrix Anal. Appl. , 37(1):290–319, 2016.
- 8[8] Chun-Feng Cui, Yu-Hong Dai, and Jiawang Nie. All real eigenvalues of symmetric tensors. SIAM J. Matrix Anal. Appl. , 35(4):1582–1601, 2014.
