Newton correction methods for computing real eigenpairs of symmetric   tensors

Ariel Jaffe; Roi Weiss; and Boaz Nadler

arXiv:1706.02132·math.NA·March 6, 2018·SIAM J. Matrix Anal. Appl.

Newton correction methods for computing real eigenpairs of symmetric tensors

Ariel Jaffe, Roi Weiss, and Boaz Nadler

PDF

TL;DR

This paper introduces a Newton-based iterative method for efficiently computing real eigenpairs of symmetric tensors, demonstrating quadratic convergence and superior performance over existing methods.

Contribution

It proposes a novel Newton correction method with proven convergence properties and empirical advantages for finding real eigenpairs of symmetric tensors.

Findings

01

Method converges quadratically near eigenpairs.

02

Finds more eigenpairs than previous methods.

03

Typically finds all real eigenpairs with multiple initializations.

Abstract

Real eigenpairs of symmetric tensors play an important role in multiple applications. In this paper we propose and analyze a fast iterative Newton-based method to compute real eigenpairs of symmetric tensors. We derive sufficient conditions for a real eigenpair to be a stable fixed point for our method, and prove that given a sufficiently close initial guess, the convergence rate is quadratic. Empirically, our method converges to a significantly larger number of eigenpairs compared to previously proposed iterative methods, and with enough random initializations typically finds all real eigenpairs. In particular, for a generic symmetric tensor, the sufficient conditions for local convergence of our Newton-based method hold simultaneously for all its real eigenpairs.

Tables1

Table 1. Table 1: O–NCM convergence properties to an eigenvector 𝒙 ∗ superscript 𝒙 ∗ \bm{x}^{\ast} .

	$H_{p} (𝒙^{*})$ full rank	$H_{p} (𝒙^{*})$ rank deficient	$H_{p} (𝒙^{*}) = 0$
Isolated	Quadratic convergence	Slow convergence	No guarantees
Non-isolated	—	No guarantees	No guarantees

Equations207

\BODY

\BODY

[T (W^{1}, \dots, W^{m})]_{i_{1}, \dots, i_{m}} = j_{1}, \dots, j_{m} \in [n] \sum W_{j_{1}, i_{1}}^{1} \dots W_{j_{m}, i_{m}}^{m} t_{j_{1}, \dots, j_{m}} .

[T (W^{1}, \dots, W^{m})]_{i_{1}, \dots, i_{m}} = j_{1}, \dots, j_{m} \in [n] \sum W_{j_{1}, i_{1}}^{1} \dots W_{j_{m}, i_{m}}^{m} t_{j_{1}, \dots, j_{m}} .

T (I, x^{*}, \dots, x^{*}) = λ^{*} x^{*} and ∥ x^{*} ∥ = 1.

T (I, x^{*}, \dots, x^{*}) = λ^{*} x^{*} and ∥ x^{*} ∥ = 1.

μ (x) = T (x, \dots, x) = i_{1}, \dots, i_{m} \in [n] \sum t_{i_{1}, \dots, i_{m}} x_{i_{1}} \dots x_{i_{m}} .

μ (x) = T (x, \dots, x) = i_{1}, \dots, i_{m} \in [n] \sum t_{i_{1}, \dots, i_{m}} x_{i_{1}} \dots x_{i_{m}} .

L (x, λ) = μ (x) - \frac{mλ}{2} (∥ x ∥^{2} - 1), λ \in R .

L (x, λ) = μ (x) - \frac{mλ}{2} (∥ x ∥^{2} - 1), λ \in R .

\frac{1}{m} \nabla_{x^{*}} L (x^{*}, λ^{*}) = T (I, x^{*}, \dots, x^{*}) - λ^{*} x^{*} = 0,

\frac{1}{m} \nabla_{x^{*}} L (x^{*}, λ^{*}) = T (I, x^{*}, \dots, x^{*}) - λ^{*} x^{*} = 0,

g (x) = \frac{1}{m} \nabla_{x} L (x, λ) ∣_{λ = μ (x)} = T (I, x, \dots, x) - μ (x) x .

g (x) = \frac{1}{m} \nabla_{x} L (x, λ) ∣_{λ = μ (x)} = T (I, x, \dots, x) - μ (x) x .

H (x) = \frac{1}{m} \nabla_{x}^{2} L (x, λ)_{λ = μ (x)} = (m - 1) T (I, I, x, \dots, x) - μ (x) I .

H (x) = \frac{1}{m} \nabla_{x}^{2} L (x, λ)_{λ = μ (x)} = (m - 1) T (I, I, x, \dots, x) - μ (x) I .

x \in R^{n} argmin ∥ T - x \otimes \dots \otimes x ∥_{F}^{2} = x \in R^{n} argmin i_{1}, \dots, i_{m} = 1 \sum n (t_{i_{1}, \dots, i_{m}} - x_{i_{1}} \dots x_{i_{m}})^{2} .

x \in R^{n} argmin ∥ T - x \otimes \dots \otimes x ∥_{F}^{2} = x \in R^{n} argmin i_{1}, \dots, i_{m} = 1 \sum n (t_{i_{1}, \dots, i_{m}} - x_{i_{1}} \dots x_{i_{m}})^{2} .

x_{(k + 1)} = \frac{\nabla _{x} μ ( x _{(k)} )}{∥ \nabla _{x} μ ( x _{(k)} ) ∥} = \frac{T ( I , x _{(k)} , \dots , x _{(k)} )}{∥ T ( I , x _{(k)} , \dots , x _{(k)} ) ∥} .

x_{(k + 1)} = \frac{\nabla _{x} μ ( x _{(k)} )}{∥ \nabla _{x} μ ( x _{(k)} ) ∥} = \frac{T ( I , x _{(k)} , \dots , x _{(k)} )}{∥ T ( I , x _{(k)} , \dots , x _{(k)} ) ∥} .

μ_{α} (x) = μ (x) + α ∥ x ∥^{m}, α \in R .

μ_{α} (x) = μ (x) + α ∥ x ∥^{m}, α \in R .

x_{(k + 1)} = \frac{\nabla _{x} μ _{α} ( x _{(k)} )}{∥ \nabla _{x} μ _{α} ( x _{(k)} ) ∥} = \frac{T ( I , x _{(k)} , \dots , x _{(k)} ) + α x _{(k)}}{∥ T ( I , x _{(k)} , \dots , x _{(k)} ) + α x _{(k)} ∥} .

x_{(k + 1)} = \frac{\nabla _{x} μ _{α} ( x _{(k)} )}{∥ \nabla _{x} μ _{α} ( x _{(k)} ) ∥} = \frac{T ( I , x _{(k)} , \dots , x _{(k)} ) + α x _{(k)}}{∥ T ( I , x _{(k)} , \dots , x _{(k)} ) + α x _{(k)} ∥} .

H_{p} (x) = U_{x}^{T} H (x) U_{x} \in R^{(n - 1) \times (n - 1)},

H_{p} (x) = U_{x}^{T} H (x) U_{x} \in R^{(n - 1) \times (n - 1)},

T (I, x + y^{*}, \dots, x + y^{*}) = μ (x + y^{*}) \cdot (x + y^{*}) .

T (I, x + y^{*}, \dots, x + y^{*}) = μ (x + y^{*}) \cdot (x + y^{*}) .

A (x) = H (x) - m x T (I, x, \dots, x)^{T} .

A (x) = H (x) - m x T (I, x, \dots, x)^{T} .

A (x) y^{*} = - g (x) + Δ (x, y^{*}),

A (x) y^{*} = - g (x) + Δ (x, y^{*}),

Δ (x, y^{*})

Δ (x, y^{*})

- i = 2 \sum m - 1 (i m - 1) T (I, m - i - 1 x, \dots, x, i y^{*}, \dots, y^{*}) .

A (x_{(k)}) y_{(k)} = - g (x_{(k)}) .

A (x_{(k)}) y_{(k)} = - g (x_{(k)}) .

y_{(k)} = - A (x_{(k)})^{- 1} g (x_{(k)}) .

y_{(k)} = - A (x_{(k)})^{- 1} g (x_{(k)}) .

x_{(k + 1)} = \frac{x _{(k)} + y _{(k)}}{∥ x _{(k)} + y _{(k)} ∥} .

x_{(k + 1)} = \frac{x _{(k)} + y _{(k)}}{∥ x _{(k)} + y _{(k)} ∥} .

H (x^{*}) = (m - 1) T (I, I, x^{*}, \dots, x^{*}) - λ^{*} I .

H (x^{*}) = (m - 1) T (I, I, x^{*}, \dots, x^{*}) - λ^{*} I .

x = α x^{*} - u^{*},

x = α x^{*} - u^{*},

β^{*} = α^{m - 2} λ^{*} - T (x, \dots, x) = α^{m - 2} λ^{*} - μ (x) .

β^{*} = α^{m - 2} λ^{*} - T (x, \dots, x) = α^{m - 2} λ^{*} - μ (x) .

T (I, α x^{*}, \dots, α x^{*}) = α^{m - 1} λ^{*} x^{*} .

T (I, α x^{*}, \dots, α x^{*}) = α^{m - 1} λ^{*} x^{*} .

T (I, x + u^{*}, \dots, x + u^{*}) = (μ (x) + β^{*}) \cdot (x + u^{*}) .

T (I, x + u^{*}, \dots, x + u^{*}) = (μ (x) + β^{*}) \cdot (x + u^{*}) .

H (x) u^{*} - β^{*} x = - g (x) + \tilde{Δ} (x, u^{*}, β^{*}),

H (x) u^{*} - β^{*} x = - g (x) + \tilde{Δ} (x, u^{*}, β^{*}),

\tilde{Δ} (x, u^{*}, β^{*}) = β^{*} u^{*} - i = 2 \sum m - 1 (i m - 1) T (I, m - i - 1 x, \dots, x, i u^{*}, \dots, u^{*}) .

\tilde{Δ} (x, u^{*}, β^{*}) = β^{*} u^{*} - i = 2 \sum m - 1 (i m - 1) T (I, m - i - 1 x, \dots, x, i u^{*}, \dots, u^{*}) .

(H (x) x^{T} - x 0) (u^{*} β^{*}) = - (g (x) 0) + (\tilde{Δ} (x, u^{*}, β^{*}) - ∥ u^{*} ∥^{2}) .

(H (x) x^{T} - x 0) (u^{*} β^{*}) = - (g (x) 0) + (\tilde{Δ} (x, u^{*}, β^{*}) - ∥ u^{*} ∥^{2}) .

(H (x) x^{T} - x 0) (u β) = - (g (x) 0) .

(H (x) x^{T} - x 0) (u β) = - (g (x) 0) .

H_{p} (x) z = - U_{x}^{T} g (x) and u = U_{x} z .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\NewEnviron

salign

[TABLE]

Newton correction methods for computing real eigenpairs of symmetric tensors

Ariel Jaffe

Weizmann Institute of Science, Israel

Roi Weiss

Weizmann Institute of Science, Israel

Boaz Nadler

Weizmann Institute of Science, Israel

Abstract

Real eigenpairs of symmetric tensors play an important role in multiple applications. In this paper we propose and analyze a fast iterative Newton-based method to compute real eigenpairs of symmetric tensors. We derive sufficient conditions for a real eigenpair to be a stable fixed point for our method, and prove that given a sufficiently close initial guess, the convergence rate is quadratic. Empirically, our method converges to a significantly larger number of eigenpairs compared to previously proposed iterative methods, and with enough random initializations typically finds all real eigenpairs. In particular, for a generic symmetric tensor, the sufficient conditions for local convergence of our Newton-based method hold simultaneously for all its real eigenpairs.

Key words. tensor eigenvectors; tensor eigenvalues; symmetric tensor; higher-order power method; Newton-based methods; Newton correction method.

Introduction

Eigenpairs of symmetric tensors have received much attention in recent years due to their applicability in a wide range of disciplines. Introduced by Lim [21] and Qi [26], tensor eigenpairs were used for example in the analysis of hypergraphs [20], high-order Markov chains [24], establishing the positive-definiteness of multivariate forms [25], diffusion tensor imaging [28, 27], and data analysis [2, 1].

The focus of this paper is on fast iterative methods to compute the real eigenpairs of symmetric tensors. These methods were recently applied by the authors and collaborators in [16] for learning a binary latent variable model by computing the eigenpairs of a third order moment tensor of the observed data.

There are major differences between tensor eigenpairs, whose formal definition is reviewed in Section 2, and their well studied matrix counterparts. Whereas any symmetric $n\times n$ matrix has exactly $n$ real eigenvalues with corresponding orthogonal eigenvectors, the situation for tensors is fundamentally different. The eigenvectors of a symmetric tensor are not necessarily orthogonal and some may in fact be complex-valued. Furthermore, the number of eigenvalues is in general significantly larger than $n$ . For symmetric real tensors of order $m$ and dimensionality $n$ , [5] proved that the number of complex eigenvalues is at most $((m-1)^{n}-1)/(m-2)$ . For generic tensors, this is the exact number of complex eigenvalues. As a lower bound, it is known that for odd-order tensors at least one real eigenvalue exists [5], while for symmetric even-ordered tensors at least $n$ real eigenvalues exist [6]. Recently, [4] analyzed the expected number of real eigenvalues of a random Gaussian symmetric tensor.

From a computational perspective, while all matrix eigenpairs can be computed efficiently, Hillar and Lim [15] showed that enumerating all eigenpairs of a symmetric tensor is in general #P. Nonetheless, Cui et al. [8] derived a method to compute all eigenpairs sequentially, based on a hierarchy of semidefinite relaxations. Chen et al. [7] proposed a homotopy continuation method for the same purpose. While these algorithms are guaranteed to find all isolated eigenpairs, they are computationally demanding even for moderate tensor dimensions. For example, computing all real eigenpairs of a random $8\times 8\times 8\times 8$ tensor using the zeig procedure in the TenEig package of [7] takes several hours on a standard PC.

In recent years, several iterative methods were developed to compute at least some tensor eigenpairs. Some methods were specifically designed to compute the largest eigenvalue [24, 22, 14, 11]. Han [13] proposed a method based on unconstrained optimization to compute both the maximal and minimal eigenvalues of an even-order tensor. As described in Section 3, adaptations of the popular power method to tensors were suggested in [9, 18, 19, 31]. While these iterative methods are computationally fast, in general they converge to only a strict subset of all eigenpairs.

In this work we present a different iterative approach to compute real eigenpairs of symmetric tensors. As detailed in Section 4, our approach is based on adapting the matrix Newton correction method (NCM) to the tensor case. We derive sufficient conditions for local convergence of NCM and prove that its convergence rate is quadratic. Our analysis reveals that NCM may fail to converge to eigenvectors with eigenvalue zero and has small attraction region for eigenvalues close to zero. To overcome this limitation, we next derive a variant, denoted the orthogonal Newton correction method (O–NCM), which enjoys improved run-time and convergence guarantees. We observe that for a generic symmetric tensor, the sufficient conditions for either NCM or O–NCM to converge to all its eigenpairs hold with probability one. In Section 5 we illustrate that these sufficient conditions are not necessary.

In Section 6 we present numerical simulations that support our theoretical analysis. For random tensors of modest size, multiple random initializations of NCM or O–NCM can find all eigenpairs significantly faster than other methods. For example, on a random $8\times 8\times 8\times 8$ tensor, our methods typically found all eigenpairs within a few seconds. We conclude with a summary and discussion in Section 7.

Notation

We denote vectors by lowercase boldface letters, as in $\bm{x}$ , matrices by uppercase letters, as in $W$ , and higher-order tensors by caligraphic letters, as in $\mathcal{T}$ . We denote $[n]=\{1,\dots,n\}$ . $I$ is the identity matrix whose dimension depends on the context and $S_{n-1}=\{\bm{x}\in\mathbb{R}^{n}:{\|\bm{x}\|}=1\}$ is the unit sphere.

The symmetric tensor eigen-problem

Let $\mathcal{T}\in\mathbb{R}^{n\times\ldots\times n}$ be a tensor of order $m$ and dimension $n$ , with entries $t_{i_{1},\ldots,i_{m}}$ , where $i_{1},\ldots,i_{m}\in[n]$ . We assume that $\mathcal{T}$ is symmetric, namely, $t_{i_{1},\ldots,i_{m}}=t_{\pi(i_{1},\ldots,i_{m})}$ for all permutations $\pi$ of the $m$ indices $i_{1},\ldots,i_{m}$ . A tensor $\mathcal{T}$ can be viewed as a multi-linear operator: for matrices $W^{1},\dots,W^{m}$ with $W^{i}\in\mathbb{R}^{n\times d_{i}}$ , the tensor-mode product, denoted by $\mathcal{T}(W^{1},\ldots,W^{m})\in\mathbb{R}^{d_{1}\times\dots\times d_{m}}$ , yields a new tensor whose $(i_{1},\dots,i_{m})^{\text{th}}$ entry is

[TABLE]

Tensor eigenpairs

Several definitions of tensor eigenpairs appear in the literature. Here we consider the one introduced as $Z$ -eigenpairs in [26] and $l^{2}$ -eigenpairs in [21].

Definition 1

A pair $(\bm{x}^{\ast},\lambda^{\ast})\in\mathbb{R}^{n}\times\mathbb{R}$ is a real eigenpair of $\mathcal{T}$ if

[TABLE]

Note that if $(\bm{x}^{\ast},\lambda^{\ast})$ is an eigenpair, then $(-\bm{x}^{\ast},(-1)^{m}\lambda^{\ast})$ is an eigenpair as well. Following common practice, we treat these two pairs as belonging to the same equivalence class [5].

Definition 1 can be equivalently stated using the following $m$ -degree homogeneous polynomial in $\bm{x}\in\mathbb{R}^{n}$ ,

[TABLE]

As shown in [21], the real eigenpairs of $\mathcal{T}$ correspond to the critical points of $\mu(\bm{x})$ when constrained to the unit sphere $S_{n-1}$ . Formally, define the Lagrangian

[TABLE]

A constrained critical point $(\bm{x}^{\ast},\lambda^{\ast})$ of $\mu$ satisfies the Karush-Kuhn-Tucker conditions,

[TABLE]

where $\lambda^{\ast}=\mu(\bm{x}^{\ast})$ is such that ${\|\bm{x}^{\ast}\|}=1$ . This is precisely Equation (2). For future use, we denote the gradient of $L(\bm{x},\lambda)$ at an arbitrary point $\bm{x}\in S_{n-1}$ by

[TABLE]

Similarly, we denote the Hessian matrix by

[TABLE]

As will become clear in Section 4, the spectral structure of $H(\bm{x}^{\ast})$ plays a fundamental role in the convergence of our proposed Newton-based methods to $(\bm{x}^{\ast},\lambda^{\ast})$ .

Power methods for computing tensor eigenpairs

To motivate our approach, it is first instructive to briefly review previous iterative methods, specifically the symmetric higher-order power method (HOPM) [9] and the shifted-HOPM [18, 19]. The HOPM was derived as a way to compute the best rank-1 approximation of a symmetric tensor under the squared error loss,

[TABLE]

Although the above problem is non-convex and has no closed form solution, it was shown in [9] that it is equivalent to finding the vector $\bm{x}^{\ast}$ with ${\|\bm{x}^{\ast}\|}=1$ that maximizes the objective function $\mu(\bm{x})$ in (3). To compute $\bm{x}^{\ast}$ , the following generalization of the matrix power method to high-order tensors was proposed. Starting from a (random) initial guess $\bm{x}_{(0)}\in S_{n-1}$ , HOPM iterates

[TABLE]

It was shown in [17, Theorem 4] that for an even-order tensor, if its associated function $\mu(\bm{x})$ is convex or concave, then HOPM is guaranteed to converge to a local optimum of $\mu(\bm{x})$ in $S_{n-1}$ . For general symmetric tensors, however, HOPM has no convergence guarantees, and may indeed fail to converge, see [17] for a specific example.

To overcome the limitations of HOPM, [18, 19] proposed the shifted function

[TABLE]

Since on the unit sphere $\mu_{\alpha}(\bm{x})=\mu(\bm{x})+\alpha$ , the critical points of $\mu$ and $\mu_{\alpha}$ are identical. Instead of (8), the shifted-HOPM iterates

[TABLE]

Importantly, the value of $\alpha$ can be tuned so that from any starting point $\bm{x}_{(0)}\in S_{n-1}$ , the shifted-HOPM is guaranteed to converge to a critical point of $\mu_{\alpha}$ . [19] further devised an adaptive shifted-HOPM, whereby the value of $\alpha_{(k)}$ is updated at each iteration so that $\mu_{\alpha_{(k)}}(\bm{x})$ is locally convex or concave around $\bm{x}_{(k)}$ . This avoids the possible slowdown of the shifted-HOPM with a fixed value of $\alpha$ , while maintaining its convergence guarantees.

Convergence, attraction regions, and stable eigenpairs

The adaptive shifted-HOPM converges only to some eigenpairs of a tensor. These may be characterized as follows. For any $\bm{x}\in S_{n-1}$ , let $U_{\bm{x}}\in\mathbb{R}^{n\times n-1}$ be a matrix with $n-1$ orthonormal columns that span the subspace orthogonal to $\bm{x}$ . Define the projected Hessian matrix,

[TABLE]

where $H(\bm{x})\in\mathbb{R}^{n\times n}$ is the Hessian matrix in (6). In [18], an eigenvector $\bm{x}^{\ast}$ was termed positive-stable if $H_{p}(\bm{x}^{\ast})$ is positive-definite and negative-stable if $H_{p}(\bm{x}^{\ast})$ is negative-definite. Otherwise, $\bm{x}^{\ast}$ is termed unstable. [18] showed that the shifted HOPM does not converge to unstable eigenvectors but does converge to the stable ones. Further, the convergence is at a linear rate. To distinguish between eigenpairs that are stable for the (adaptive) shifted-HOPM and those that are stable for the Newton-based methods, we henceforth refer to the above as power-stable eigenpairs, power-unstable eigenpairs, etc.

As an example, the left panel of Figure 1 shows the value of $\mu(\bm{x})$ over the unit sphere for a $3\times 3\times 3$ symmetric tensor with $7$ real eigenvectors. Three eigenvectors, depicted in red, are power-unstable, while the remaining four, depicted in black, are power-stable. The right panel of the same figure shows the results of the adaptive shifted-HOPM. The color indicates the eigenvector to which the method converged, starting from various locations on the unit sphere. The figure shows clear convergence regions around $3$ of the power-stable eigenpairs. The region around the fourth power-stable eigenpair appears on the back side of the sphere. In agreement with theory, the adaptive HOPM did not converge to any of the three power-unstable eigenpairs.

Newton-based methods for the tensor eigen-problem

Given the limitations of the aforementioned methods, our goal is to derive a fast iterative algorithm that under mild assumptions is able to converge to all real eigenpairs of a symmetric tensor. To this end, we develop a Newton-based method.

4.1 Newton correction method

Several variants of Newton’s method were derived for the symmetric matrix eigen-problem, see for example [30, Chapter 6]. Here we derive a Newton-based method for computing the eigenpairs of symmetric tensors. Recently, [12] considered a similar approach for finding some nonnegative eigenpairs of a nonnegative tensor.

Let $(\bm{x}^{\ast},\lambda^{\ast})$ be an eigenpair of a symmetric tensor $\mathcal{T}$ of order $m$ and dimensionality $n$ . Given an approximation $\bm{x}$ to $\bm{x}^{\ast}$ , our goal is to obtain an improved approximation $\bm{x}^{\prime}$ (see Figure 3, left). Denote the exact unknown correction by $\bm{y}^{*}=\bm{x}^{\ast}-\bm{x}$ and recall $\mu(\bm{x})=\mathcal{T}(\bm{x},\ldots,\bm{x})$ . Since $\bm{x}^{\ast}=\bm{x}+\bm{y}^{*}$ and $\lambda^{\ast}=\mu(\bm{x}^{\ast})=\mu(\bm{x}+\bm{y}^{*})$ , the eigen-problem in (2) can be written as

[TABLE]

Recalling the Hessian matrix $H(\bm{x})$ in (6), we define the matrix $A(\bm{x})\in\mathbb{R}^{n\times n}$ by

[TABLE]

Setting apart the terms that are linear in $\bm{y}^{*}$ , Eq. (11) takes the form

[TABLE]

where $\bm{g}(\bm{x})$ is given in (5). Here, $\Delta(\bm{x},\bm{y}^{*})$ accounts for all high order terms in $\bm{y}^{*}$ ,

[TABLE]

By definition, the solution $\bm{y}^{*}$ to (13) satisfies $\bm{x}+\bm{y}^{*}=\bm{x}^{\ast}$ . However, solving (13) exactly for $\bm{y}^{*}$ is as difficult as finding the eigenpair $(\bm{x}^{\ast},\lambda^{\ast}$ ) of the tensor $\mathcal{T}$ we started from. Instead, we devise an iterative Newton correction method (NCM) that solves (13) only approximately. Given the approximation $\bm{x}_{(k)}$ of $\bm{x}^{\ast}$ at the $k^{\text{th}}$ iteration, NCM computes a new approximation $\bm{x}_{(k+1)}$ by neglecting the high order terms $\Delta(\bm{x},\bm{y}^{*})$ in (13). This amounts to solving the system of $n$ linear equations

[TABLE]

Assuming $A(\bm{x}_{(k)})$ is invertible, the unique solution to (15) is given by

[TABLE]

The new approximation $\bm{x}_{(k+1)}$ for $\bm{x}^{\ast}$ is then

[TABLE]

Given an initial guess $\bm{x}_{(0)}\in S_{n-1}$ , NCM iterates steps (16) and (17). Once a stopping condition is met, the pair $(\bm{x}_{(k)},\mu(\bm{x}_{(k)}))$ is returned; see Algorithm 1.

The left panel of Figure 2 shows the convergence regions of NCM for the eigenpairs of the same tensor as in Figure 1. In this case, all eigenpairs are attracting points of NCM and can thus be found by running Algorithm 1 multiple times with different (random) initial guesses.

Convergence guarantees

Two questions regarding NCM are (i) to which eigenpairs of $\mathcal{T}$ the method can converge to? and (ii) what is the convergence rate? To answer these questions we recall the definition of the Hessian matrix in (6). Given an eigenpair $(\bm{x}^{\ast},\lambda^{\ast})$ , its corresponding Hessian matrix is

[TABLE]

Note that $H(\bm{x}^{\ast})$ is symmetric and has $\bm{x}^{\ast}$ as an eigenvector with eigenvalue $(m-2)\lambda^{\ast}$ . Denote by $\mu_{1}^{\ast},\ldots,\mu_{n-1}^{\ast}$ the other $n-1$ eigenvalues of $H(\bm{x}^{\ast})$ . By definition, these are the eigenvalues of the projected Hessian $H_{p}(\bm{x}^{\ast})$ in (10).

Definition 2

For $\gamma>0$ , an eigenpair $(\bm{x}^{\ast},\lambda^{\ast})$ is $\gamma$ -Newton-stable if all eigenvalues of $H_{p}(\bm{x}^{\ast})$ in absolute value are at least $\gamma$ , namely, $\min_{i}|\mu_{i}^{*}|\geq\gamma$ .

Note that $H_{p}(\bm{x}^{\ast})$ is full rank iff $\bm{x}^{\ast}$ is $\gamma$ -Newton-stable for some $\gamma>0$ . Similarly, $H(\bm{x}^{\ast})$ is full rank iff $\bm{x}^{\ast}$ is $\gamma$ -Newton-stable for some $\gamma>0$ and $\lambda^{\ast}\neq 0$ . We have the following convergence guarantee for NCM. The proof is given in Appendix A.

Theorem 1

Let $(\bm{x}^{\ast},\lambda^{\ast})$ be an eigenpair of a symmetric tensor $\mathcal{T}$ . Suppose that $(\bm{x}^{\ast},\lambda^{\ast})$ is $\gamma$ -Newton-stable and that $\lambda^{\ast}\neq 0$ . Then there exists an $\varepsilon=\varepsilon(\gamma,\lambda^{\ast})>0$ such that for any $\bm{x}_{(0)}$ that satisfies ${\|\bm{x}_{(0)}-\bm{x}^{\ast}\|}<\varepsilon$ , the sequence $\bm{x}_{(0)},\bm{x}_{(1)},\dots$ , computed by Algorithm 1 converges to $\bm{x}^{\ast}$ at a quadratic rate.

4.2 Orthogonal Newton correction method

As discussed above, the NCM method may not converge to an eigenvector with eigenvalue $\lambda^{\ast}=0$ . To remove this limitation, we now develop an orthogonal NCM variant. Given an approximation $\bm{x}\in S_{n-1}$ of $\bm{x}^{\ast}$ , we first decompose it into its projection onto $\bm{x}^{\ast}$ and a residual (see Figure 3, right),

[TABLE]

where $\alpha=\bm{x}^{T}\bm{x}^{\ast}$ and $\bm{u}^{*}=\alpha\bm{x}^{\ast}-\bm{x}$ is the residual. Since $\bm{x}^{\ast}$ and $\bm{u}^{*}$ are orthogonal and $\bm{x}\in S_{n-1}$ , ${\|\bm{x}\|}^{2}=\alpha^{2}+{\|\bm{u}^{*}\|}^{2}=1$ . For reasons to become clear shortly, we also introduce a correction $\beta^{*}\equiv\beta^{*}(\bm{x},\bm{x}^{\ast})$ to the eigenvalue $\lambda^{\ast}$ , defined as

[TABLE]

When $\bm{x}=\bm{x}^{\ast}$ , we have $\bm{u}^{*}=0$ , $\alpha=1$ and $\beta^{*}=0$ . Since $(\bm{x}^{\ast},\lambda^{\ast})$ is an eigenpair,

[TABLE]

Inserting $\bm{x}^{\ast}=\frac{1}{\alpha}(\bm{x}+\bm{u}^{*})$ and $\beta^{*}$ into the above equation gives

[TABLE]

We set apart the terms that are linear in $\bm{u}^{*}$ and $\beta^{*}$ to obtain

[TABLE]

where $\bm{g}$ and $H$ were defined in (5) and (6) respectively and $\tilde{\Delta}(\bm{x},\bm{u}^{*},\beta^{*})$ includes all the remaining higher order terms in $(\bm{u}^{*},\beta^{*})$ ,

[TABLE]

Combining (21) with the orthogonality condition, $(\bm{x}^{\ast})^{T}\bm{u}^{*}\propto(\bm{x}+\bm{u}^{*})^{T}\bm{u}^{*}=0$ , gives the following set of non-linear equations in $(\bm{u}^{\ast},\beta^{*})$ ,

[TABLE]

By construction, the solution $(\bm{u}^{\ast},\beta^{*})$ to (23) satisfies $(\bm{x}+\bm{u}^{\ast})/{\|\bm{x}+\bm{u}^{\ast}\|}=\bm{x}^{\ast}$ . Similarly to the NCM, we neglect the high order terms in the right hand side of (23) and solve the system of linear equations in the $n+1$ unknowns $(\bm{u},\beta)$ ,

[TABLE]

Due to the extra variable $\beta$ , (24) seems to be of dimension $n+1$ , as opposed to the $n$ dimensional system in (15). However, as we now show, the system in (24) can be equivalently solved in the $n-1$ dimensional subspace orthogonal to $\bm{x}$ . More precisely, let $P_{\bm{x}}^{\perp}=(I-\bm{x}\bm{x}^{T})$ be the projection matrix into the subspace orthogonal to $\bm{x}$ and let $U_{\bm{x}}\in\mathbb{R}^{n\times(n-1)}$ have orthonormal columns such that $P_{\bm{x}}^{\perp}=U_{\bm{x}}U_{\bm{x}}^{T}$ . Recall the projected Hessian matrix $H_{p}(\bm{x})=U_{\bm{x}}^{T}H(\bm{x})U_{\bm{x}}$ in (10). The following lemma is an adaptation of [30, Theorem 6.2.2] to our setting. Its proof is given in Appendix B.

Lemma 1

A vector $\bm{u}\in\mathbb{R}^{n}$ satisfies (24) if and only if $\bm{z}\in\mathbb{R}^{n-1}$ satisfies

[TABLE]

Assuming $H_{p}(\bm{x})$ is invertible, the solution to (25) is $\bm{u}=-U_{\bm{x}}H_{p}(\bm{x})^{-1}U_{\bm{x}}^{T}\bm{g}(\bm{x}).$ So given the $k^{\text{th}}$ approximation $\bm{x}_{(k)}$ to $\bm{x}^{\ast}$ , O–NCM computes

[TABLE]

and the new approximation is $\bm{x}_{(k+1)}=(\bm{x}_{(k)}+\bm{u}_{(k)})/{\|\bm{x}_{(k)}+\bm{u}_{(k)}\|}$ . Given an initial $\bm{x}_{(0)}$ , O–NCM iterates these steps until a stopping condition is met; see Algorithm 2.

The right panel of Figure 2 shows the convergence regions of O–NCM for the various eigenpairs of the same tensor in Figure 1. Similarly to NCM, in this case, all eigenpairs are attracting points of O–NCM, but with slightly different regions.

Convergence guarantees

We have the following convergence guarantee for O–NCM. It is similar to that of NCM in Theorem 1, but with the condition $\lambda^{*}\neq 0$ removed. The proof is given in Appendix C.

Theorem 2

Let $(\bm{x}^{\ast},\lambda^{\ast})$ be a $\gamma$ -Newton-stable eigenpair of a symmetric tensor $\mathcal{T}$ . There exists an $\varepsilon=\varepsilon(\gamma)>0$ such that for any $\bm{x}_{(0)}\in S_{n-1}$ that satisfies ${\|\bm{x}_{(0)}-\bm{x}^{\ast}\|}<\varepsilon$ , the sequence $\bm{x}_{(0)},\bm{x}_{(1)},\dots$ , computed by Algorithm 2 converges to $\bm{x}^{\ast}$ at a quadratic rate.

Remark 1

Recall that any eigenpair $(\bm{x}^{\ast},\lambda^{\ast})$ to which the shifted-HOPM method converges to has a Hessian matrix which is either positive definite or negative definite. In either case, this Hessian matrix has full rank, and thus by Theorem 2, is a stable fixed point of O–NCM. In other words, O–NCM typically converges to many more tensor eigenpairs than the shifted power method. However, the adaptive shifted-HOPM is guaranteed to converge from any initial point, whereas no such global convergence guarantee is currently available for the Newton-based methods.

Convergence to eigenpairs with a rank deficient Hessian

The sufficient condition in Theorem 2 for O–NCM to converge to an eigenpair $(\bm{x}^{\ast},\lambda^{\ast})$ rests on the smallest absolute eigenvalue of the projected Hessian matrix $H_{p}(\bm{x}^{\ast})$ . In particular, if the eigenpair is $\gamma$ -Newton-stable for some $\gamma>0$ , an attraction neighborhood around $\bm{x}^{\ast}$ exists. We now illustrate that this sufficient condition for convergence is by no means necessary. To this end, we analyze a simple example. Denote $\bm{1}=(1,\ldots,1)^{T}\in\mathbb{R}^{n}$ . For any $\omega\in\mathbb{R}$ , define the following cubic $n$ -dimensional symmetric tensor,

[TABLE]

When $\omega=0$ , $\mathcal{T}_{\omega}$ is orthogonal, having the maximally possible number of $2^{n}-1$ real eigenpairs. All these are Newton-stable and among them are the $n$ power-stable eigenpairs $\{(\bm{e}_{i},1)\}_{i=1}^{n}$ . Assume $n$ is odd and denote $l=\lfloor n/2\rfloor$ . Let $N(\omega)$ be the number of real eigenpairs of $\mathcal{T}_{\omega}$ . The following proposition is proved in Appendix D.

Proposition 1

Define $l$ thresholds, $\omega_{i}=\frac{1}{4(l-i)(n-l+i)}>0$ , $i\in\{0,\ldots,l-1\}$ .

(i)

The number $N(\omega)$ of real eigenpairs of $\mathcal{T}_{\omega}$ of Eq. (27) is

[TABLE]

(ii)

For $\omega_{i}\in\{\omega_{0},\dots,\omega_{l-1}\}$ , $\binom{n}{l-i}$ out of the $N(\omega_{i})$ real eigenpairs of $\mathcal{T}(\omega_{i})$ are not Newton-stable.

We illustrate the above properties for $\mathcal{T}_{\omega}$ with $n=5$ . In this case, $l=\lfloor n/2\rfloor=2$ and there are two thresholds, $\omega_{0}=\frac{1}{4\cdot 2\cdot 3}\approx 0.0417$ and $\omega_{1}=\frac{1}{4\cdot 1\cdot 4}=0.0625$ . Figure 4 shows the number $N(\omega)$ of real eigenpairs (left), and the different eigenvalues of $\mathcal{T}_{\omega}$ (right) as computed by O–NCM. As expected, at $\omega_{0}$ and $\omega_{1}$ , the number of real eigenvalues decreases.

Next, we examine the convergence of O–NCM on $\mathcal{T}_{\omega=\omega_{0}}$ with $n=3$ . According to Proposition 1, $\omega_{0}=0.125$ , and the number of real eigenpairs is $N(\omega_{0})=1+\binom{3}{1}=4$ , three of which are not Newton-stable. Figure 5 shows the attraction regions around two of the eigevectors of $\mathcal{T}_{\omega_{0}}$ , as well as the full unit sphere. On the left, the eigenvector is Newton-stable. As expected, O–NCM converged to this eigenvector from any point in its neighborhood. In contrast, the eigenvector on the right is not Newton-stable. In this case, there is a positive probability of converging to a different eigenvector even when the initial guess is arbitrary close. Nonetheless, O–NCM converged to this eigenvector from some directions, even though the sufficient condition in Theorem 2 does not hold.

In this example, all eigenpairs of $\mathcal{T}_{\omega}$ are isolated, namely each one of them is the unique eigenpair in a small neighborhood around it. In addition, the eigenvectors which are not Newton-stable have a projected Hessian matrix that is rank deficient but non-zero. In Appendix E, we illustrate the behavior of O–NCM near eigenvectors that are either non-isolated, or have a projected Hessian matrix equals to zero. In these cases, O–NCM may not converge.

Simulation results

In this section we study numerically the performance of NCM and O–NCM for computing the real eigenpairs of symmetric tensors, as compared to the homotopy method [7] and the adaptive shifted-HOPM [19].111Matlab code for the NCM methods can be found at https://github.com/arJaffe/NCM, for the homotopy method at http://users.math.msu.edu/users/chenlipi/TenEig.html, and for the shifted HOPM at http://www.sandia.gov/~tgkolda/TensorToolbox/index-2.6.html. In the experiments we consider random Gaussian symmetric tensors, whose entries are all i.i.d. $\mathcal{N}(0,1)$ up to the symmetry constraints. All experiments were done on a PC with an Intel i-3820 $3.6\text{GHz}$ processor, $16\text{GB}$ RAM and MATLAB version R2016a.

Finding all real eigenpairs

We examine the time needed to compute all eigenpairs for random tensors of order $m=4$ and various dimensions $n$ . Similar results are obtained for other values of $m$ . For each tensor, we first ran the homotopy method to obtain all its eigenpairs. Next, we ran NCM and O–NCM, initialized repeatedly with random points on the unit sphere, until all eigenpairs were found. Note that without running the homotopy method first, we would have no criterion to decide whether we actually found all tensor’s eigenpairs. The process was sequential, where a new run was initialized only after the previous one ended. This process, however, can be easily parallelized. We stopped the NCM iterations when $\|\bm{x}_{(k)}-\bm{x}_{(k-1)}\|<\delta=10^{-10}$ or if a maximal number of $k=k_{\max}=200$ iterations was reached. In the latter case we declared that NCM failed to converge. For both NCM and O–NCM, and for all tensor dimensions we considered, only $\approx 0.2\%$ of all random initializations failed to converge within the maximal number of iterations.

Figure 6 (left) shows the number of real eigenpairs as averaged over 10 independent tensors for each value of $n$ . Figure 6 (right) shows on a logarithmic scale the average time it took to compute all real eigenpairs via the homotopy, NCM and O–NCM, for the same tensors. These results show that both NCM and O–NCM recovered all eigenpairs faster than the homotopy method by approximately two orders of magnitude. Moreover, O–NCM did so much faster than NCM.

Small eigenvalues

To understand the gap in the runtime of NCM and O–NCM shown in Figure 6, we next examine the dependence of both methods on the eigenpair to which they converge. As suggested by Theorems 1 and 2, we expect O–NCM to have larger attraction regions for small eigenvalues as compared to NCM. Figure 7 (left) shows on a log-log scale the relative number of times the two methods converged to each eigenvalue as a function of its absolute value for a typical random tensor of order $m=4$ and dimension $n=8$ . These counts correspond to a total of $10^{6}$ random initializations uniformly distributed on the unit sphere. As one can see, the probability for NCM to converge to an eigenpair decreases sharply when its eigenvalue becomes small, while for O–NCM this probability seems to be independent of $|\lambda|$ . This difference is the source of the gap in the runtime of the two methods for finding all eigenpairs. For completeness, the eigenvalues found by the adaptive S-HOPM are also presented.

Convergence rates

Figure 7 (right) shows the median runtime till convergence of the NCM and the shifted HOPM. The stopping condition for all methods was set to $\|\bm{x}_{(k)}-\bm{x}_{(k+1)}\|<\delta=10^{-10}$ . The experiment was done on $100$ random tensors of fourth order with various dimensions. For each tensor, we initialized all methods with $100$ random starting points. To avoid the influence of any particular implementation, we normalized the results with the runtime of both methods for $n=3$ . As illustrated in Fig. 7, the runtime increase of the NCM or O–NCM is significantly slower than the corresponding increase in the adaptive shifted-HOPM. However, each NCM/O–NCM iteration may be slower, as it requires matrix inversion.

Discussion and summary

In this paper we developed and analyzed a Newton-based iterative approach to compute real eigenpairs of symmetric tensors. We now briefly discuss three important issues: its runtime, its ability to find all tensor eigenpairs, and its optimization point of view.

Runtime

The computational complexity of each NCM or O–NCM iteration is dominated by two operations: computing the Hessian matrix in $\mathcal{O}(n^{m})$ time and solving a system of $n$ linear equations in $\mathcal{O}(n^{3})$ time. The latter step may be significantly sped up by applying various preconditioning techniques, as done in other iterative methods that solve systems of linear equations [10]. For sparse tensors, the computation of the Hessian can be accelerated as well, see [29].

Optimization point of view

Following a constructive comment by one of the referees, we note that NCM can be seen as an adaptation of the Gauss–Newton method [3]. Recall that $\bm{g}(\bm{x})=\mathcal{T}(I,\bm{x},\dots,\bm{x})-\mu(\bm{x})\bm{x}=\bm{0}$ if and only if $\bm{x}^{\ast}\in S_{n-1}$ is an eigenvector of $\mathcal{T}$ , with a corresponding eigenvalue $\lambda^{*}=\mu(\bm{x}^{\ast})$ . Hence, our goal is to find the global minima of the realizable nonlinear least-squares problem

[TABLE]

Given the current estimate $\bm{x}_{(k)}\in S_{n-1}$ , Gauss–Newton first linearizes $\bm{g}(\bm{x})$ at $\bm{x}_{(k)}$ ,

[TABLE]

where $[A(\bm{x})]_{ij}=\partial g_{i}(\bm{x})/\partial x_{j}$ is the $n\times n$ Jacobian matrix of $\bm{g}(\bm{x})$ , given in (12). Then, instead of (28), the following approximate linear least-squares problem is solved,

[TABLE]

which is exactly the NCM correction in (15).

Besides NCM, other nonlinear optimization methods can be used to solve (28), such as the Levenberg-Marquardt algorithm [23] and other trust-region and line search algorithms. These methods, among other things, introduce an additional regularization term to better control the direction in which the method proceeds at each iteration, similarly to the role played by the (adaptive) shifted-HOPM as compared to HOPM. Specifically, instead of (15), one solves the following linear system with an appropriate regularization matrix $B_{k}\in\mathbb{R}^{n\times n}$ ,

[TABLE]

While NCM currently has no global convergence guarantees, an appropriate (adaptive) choice of $B_{k}$ can lead to global convergence guarantees, including to eigenvectors having a zero Hessian. Further studying the role of regularization for the tensor eigen-problem is an interesting direction for future research.

However, in addition to the global minima, $f(\bm{x})\equiv\frac{1}{2}{\|\bm{g}(\bm{x})\|}^{2}$ may have local minima which should be avoided. Interestingly, NCM elegantly avoids such local minima as the following example illustrates. In Figure 8 (left), we plot $f(\bm{x})$ as a function of $\bm{x}\in S_{2}$ for the $3\times 3\times 3\times 3$ symmetric tensor of Example 1 in [17]. Its eigenvectors are depicted by circles while a local minimum $\bm{x}_{\text{loc}}$ of $f$ is depicted by a square symbol. In Figure 8 (right) we show the attraction regions for NCM starting from various locations on $S_{n-1}$ . As one can see, NCM does not converge to $\bm{x}_{\text{loc}}$ and in fact is highly unstable around this point; close initial points in this neighborhood may converge to arbitrarily far eigenvectors.

To see why this is so, note that since $\bm{x}_{\text{loc}}$ is a local minimum, for an initial point $\bm{x}_{(0)}$ near $\bm{x}_{\text{loc}}$ , NCM may get closer and closer to $\bm{x}_{\text{loc}}$ at the first few iterations. However, the facts that $\bm{g}(\bm{x}_{\text{loc}})$ is bounded away from $\bm{0}$ and $\nabla f(\bm{x}_{\text{loc}})=A(\bm{x}_{\text{loc}})^{T}\bm{g}(\bm{x}_{\text{loc}})=\bm{0}$ implies that $\bm{g}(\bm{x}_{\text{loc}})$ is in the null space of $A(\bm{x}_{\text{loc}})^{T}$ . As $\bm{x}_{(k)}$ gets closer to $\bm{x}_{\text{loc}}$ , $A(\bm{x}_{(k)})$ becomes close to singular. The result is an overshoot, a sharp increase in ${\|\bm{y}_{(k)}\|}={\|A(\bm{x}_{(k)})^{-1}\bm{g}(\bm{x}_{(k)})\|}$ , taking $\bm{x}_{(k+1)}$ far away from $\bm{x}_{(k)}$ and $\bm{x}_{\text{loc}}$ .

Finding all eigenpairs of generic tensors

According to our theoretical analysis, NCM and O–NCM converge to eigenpairs whose Hessian matrix is full rank. An interesting question is whether these methods can thus converge to all real eigenpairs of a generic symmetric tensor [5]. Interpreting generic in the sense of algebraic geometry, an adaptation of [5, Theorem 1.2] to the symmetric tensor case implies the following (proof omitted).

Proposition 2

All real eigenpairs of a generic symmetric tensor are Newton-stable.

Hence, Theorems 1 and 2 imply that NCM and O–NCM are guaranteed to find all eigenpairs of a generic symmetric tensor given a sufficiently large number of random initializations.

Acknowledgments

We thank Lek–Heng Lim, Meirav Galun and Haim Avron for interesting discussions.

Appendix A Convergence of NCM

To prove Theorem 1 we shall make use of the following auxiliary lemma.

Lemma 2

Consider one update step of Algorithm 1, as in Equation (17), starting from an initial $\bm{x}\in S_{n-1}$ and ending with $\bm{x}^{\prime}=(\bm{x}+\bm{y})/{\|\bm{x}+\bm{y}\|}\in S_{n-1}$ . Let $\bm{y}^{*}=\bm{x}^{\ast}-\bm{x}$ . If ${\|\bm{y}-\bm{y}^{*}\|}\leq 1/2$ , then

[TABLE]

Proof 1

By definition,

[TABLE]

Since ${\|\bm{x}^{\ast}\|}=1$ , by the triangle inequality,

[TABLE]

Applying the triangle inequality to (30), combined with the assumption ${\|\bm{y}^{*}-\bm{y}\|}\leq 1/2$ ,

[TABLE]

hence concluding the proof.

Proof 2 (Proof of Theorem 1)

To prove quadratic convergence it suffices to show that there exists an $\varepsilon>0$ and a constant $C>0$ such that from any initial point $\bm{x}_{(0)}$ that satisfies ${\|\bm{x}_{(0)}-\bm{x}^{\ast}\|}<\varepsilon$ ,

[TABLE]

*We start by analyzing $e_{k}$ at the first iteration $k=0$ . Let $\bm{y}_{(0)}$ be the approximate correction of $\bm{y}^{*}=\bm{x}^{\ast}-\bm{x}_{(0)}$ , given by the solution of (15). The new approximation of $\bm{x}^{\ast}$ , given by Eq. (17), is $\bm{x}_{(1)}=(\bm{x}_{(0)}+\bm{y}_{(0)})/{\|\bm{x}_{(0)}+\bm{y}_{(0)}\|}$ . Assume for the moment that the initial guess $\bm{x}_{(0)}$ is sufficiently close to $\bm{x}^{\ast}$ so that ${\|\bm{y}^{*}-\bm{y}_{(0)}\|}<1/2$ . Then, by Lemma 2, *

[TABLE]

Hence, it suffices to bound ${\|\bm{y}^{*}-\bm{y}_{(0)}\|}$ . To this end, we view the exact system of non-linear equations (13), whose solution is $\bm{y}^{*}$ , as a perturbation of the approximate system of linear equations (15), whose solution is $\bm{y}_{(0)}$ . Consider the matrix $A$ of Eq. (12) evaluated at the eigenvector $\bm{x}^{\ast}$ ,

[TABLE]

Note that $A(\bm{x}^{\ast})$ is symmetric with eigenvalues $(\mu_{1}^{\ast},\ldots,\mu_{n-1}^{\ast},-2\lambda^{\ast})$ . Since $\bm{x}^{\ast}$ is $\gamma$ -Newton-stable, $|{\mu_{i}^{\ast}}|\geq\gamma$ for all $i\in[n-1]$ . In addition, since $\lambda^{\ast}\neq 0$ by assumption, $A(\bm{x}^{\ast})$ is full rank with smallest singular value

[TABLE]

By the continuity of $\sigma_{\min}(A(\bm{x}))$ in $\bm{x}$ , there exists a $\varepsilon_{1}>0$ such that $\sigma_{\min}(A(\bm{x}))\geq\sigma^{*}/2$ for all $\bm{x}$ with ${\|\bm{x}^{\ast}-\bm{x}\|}\leq\varepsilon_{1}$ . In particular, if ${\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}<\varepsilon_{1}$ , then $A(\bm{x}_{(0)})$ is invertible and the solution to (13) satisfies the following implicit equation in $\bm{y}^{*}$ ,

[TABLE]

Similarly, the unique solution to the correction equation (15) is as in (16),

[TABLE]

Subtracting the last two equations gives

[TABLE]

To bound ${\|\Delta(\bm{x}_{(0)},\bm{y}^{*})\|}$ , first note that for any symmetric tensor $\mathcal{T}$ there exists an $M=M(\mathcal{T})<\infty$ such that for any $\bm{x}\in S_{n-1}$ , $\bm{y}\in\mathbb{R}^{n}$ and $j\leq m-1$ ,

[TABLE]

Similar bounds hold for $\mathcal{T}(\bm{x},\ldots,\bm{x},\bm{y},\dots,\bm{y})\bm{x}$ and $\mathcal{T}(\bm{x},\ldots,\bm{x},\bm{y},\dots,\bm{y})\bm{y}$ according to their powers in $\bm{y}$ . Bounding each term of $\Delta(\bm{x}_{(0)},\bm{y}^{*})$ in (14) separately by (34), there are less than $3m^{2}$ terms involving $M{\|\bm{y}^{*}\|}^{2}$ and at most $3\cdot 2^{m}$ terms involving $M{\|\bm{y}^{*}\|}^{j}$ with $j\in\{3,\dots,m\}$ . Assuming ${\|\bm{y}^{*}\|}={\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}<m^{2}/2^{m}\leq 1$ , implies

[TABLE]

Inserting this bound into (33),

[TABLE]

Note that if ${\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}={\|\bm{y}^{*}\|}\leq\varepsilon_{2}=(\sigma^{*}/24m^{2}M)^{1/2}$ , then $\|\bm{y}^{*}-\bm{y}_{(0)}\|\leq 1/2$ as required by Lemma 2. Under this condition, by Eq. (31), it follows that

[TABLE]

As an interim summary, if ${\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}\leq\min\{\varepsilon_{1},\varepsilon_{2},m^{2}/2^{m}\}=\varepsilon_{0}$ , then Eq. (36) holds. We conclude the proof for a general iteration $k\geq 1$ by induction. For the first induction step to work, it required that if ${\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}\leq\varepsilon<\varepsilon_{0}$ , then ${\|\bm{x}^{\ast}-\bm{x}_{(1)}\|}<\varepsilon$ as well. By (36), this is satisfied for $\varepsilon=\min\{\varepsilon_{0},\sigma^{*}/48m^{2}M\}$ and the proof for a general $k$ holds similarly. The quadratic convergence of Algorithm 1 follows.

Appendix B Proof of Lemma 1

We show that a vector $\bm{u}$ satisfies (24) if and only if it satisfies

[TABLE]

Lemma 1 then follows by recalling that $P_{\bm{x}}^{\perp}=U_{\bm{x}}U_{\bm{x}}^{T}$ and multiplying the first equation in (37) by $U_{\bm{x}}^{T}$ from the left.

To prove the first direction, note that by the last row of (24), the solution $\bm{u}$ to (24) is perpendicular to $\bm{x}$ , so $\bm{x}^{T}\bm{u}=0$ and $P_{\bm{x}}^{\perp}\bm{u}=\bm{u}$ . Multiplying the first “row” of (24) by $P_{\bm{x}}^{\perp}$ from the left and noting that $P_{\bm{x}}^{\perp}\bm{x}=\bm{0}$ , we find that the left hand side is given by

[TABLE]

In addition, one can easily check that $\bm{g}(\bm{x})$ is perpendicular to $\bm{x}$ , so the right hand side of the equality in (37) is $-P_{\bm{x}}^{\perp}\bm{g}(\bm{x})=-\bm{g}(\bm{x})$ and (37) follows.

To prove the other direction, suppose $\bm{u}$ satisfies (37). So $\bm{u}^{T}\bm{x}=0$ and $P_{\bm{x}}^{\perp}\bm{u}=\bm{u}$ . Define $\beta=\bm{x}^{T}H(\bm{x})\bm{u}$ and write the left hand side of (37) as

[TABLE]

Since $-P_{\bm{x}}^{\perp}\bm{g}(\bm{x})=-\bm{g}(\bm{x})$ , it follows that $(\bm{u},\beta)$ satisfies (24) as required.

Appendix C Convergence of O–NCM

The proof of Theorem 2 is similar to that of Theorem 1, and makes use of the following auxiliary lemma.

Lemma 3

Consider one update step of Algorithm 2, as in Equation (26), starting from an initial $\bm{x}\in S_{n-1}$ and ending with $\bm{x}^{\prime}=(\bm{x}+\bm{u})/{\|\bm{x}+\bm{u}\|}\in S_{n-1}$ . Let $\alpha=\bm{x}^{T}\bm{x}^{\ast}$ and $\bm{u}^{*}=\alpha\bm{x}^{\ast}-\bm{x}$ . If $\alpha\geq 1/2$ and ${\|\bm{u}^{*}-\bm{u}\|}\leq 1/4$ , then

[TABLE]

Proof 3

By definition,

[TABLE]

Since ${\|\bm{x}^{\ast}\|}=1$ and $\alpha>0$ , by the triangle inequality,

[TABLE]

Applying the triangle inequality to (40), combined with the assumption ${\|\bm{u}^{*}-\bm{u}\|}\leq\alpha/2$ ,

[TABLE]

hence concluding the proof.

Proof 4 (Proof of Theorem 2)

We show that there exists an $\varepsilon>0$ and a constant $C>0$ , such that for any initial point $\bm{x}_{(0)}$ that satisfies $\|\bm{x}_{(0)}-\bm{x}^{\ast}\|<\varepsilon$ ,

[TABLE]

We start by analyzing $e_{k}$ at the first iteration $k=0$ . Let $\bm{u}_{(0)}=U_{\bm{x}_{(0)}}\bm{z}_{(0)}$ be the approximate correction of $\bm{u}^{*}=\alpha\bm{x}^{\ast}-\bm{x}_{(0)}$ , given by the solution of (25). The new approximation of $\bm{x}^{\ast}$ is $\bm{x}_{(1)}=(\bm{x}_{(0)}+\bm{u}_{(0)})/{\|\bm{x}_{(0)}+\bm{u}_{(0)}\|}$ . Since $\bm{u}^{*}$ is orthogonal to $\bm{x}^{\ast}$ , the denominator of $e_{0}$ satisfies

[TABLE]

*To bound the numerator of $e_{0}$ , assume for the moment that $\bm{x}_{(0)}$ is sufficiently close to $\bm{x}^{\ast}$ so that $\alpha=\bm{x}_{(0)}^{T}\bm{x}^{\ast}\geq 1/2$ and ${\|\bm{u}^{*}-\bm{u}_{(0)}\|}\leq\alpha/2$ . Then, by Lemma 3, *

[TABLE]

Hence, it suffices to bound ${\|\bm{u}^{*}-\bm{u}_{(0)}\|}$ . Define $\bm{z}^{*}=U_{\bm{x}_{(0)}}^{T}\bm{u}^{*}$ and note that since $\bm{x}^{\ast}$ and $\bm{u}^{*}$ are orthogonal, $\bm{x}_{(0)}^{T}\bm{u}^{*}=(\alpha\bm{x}^{\ast}-\bm{u}^{*})^{T}\bm{u}^{*}=-\|\bm{u}^{*}\|^{2}$ . Writing $I=U_{\bm{x}_{(0)}}U_{\bm{x}_{(0)}}^{T}+\bm{x}_{(0)}\bm{x}_{(0)}^{T}$ , we thus have

[TABLE]

Since ${\|\bm{x}_{(0)}\|}=1$ ,

[TABLE]

To bound $\|\bm{z}^{*}-\bm{z}_{(0)}\|$ , we view the exact system of non-linear equations (23), whose solution is $\bm{u}^{*}$ , as a perturbation of the approximate system of linear equations (25), whose solution is $\bm{u}_{(0)}=U_{\bm{x}_{(0)}}\bm{z}_{(0)}$ . By (23), $\bm{u}^{*}$ solves the non-linear equation

[TABLE]

We multiply (46) by $U_{\bm{x}_{(0)}}^{T}$ from the left and plugin (44) to obtain the set of non-linear equations in $\bm{z}^{*}$ (and $\beta^{*},\bm{u}^{*}$ ),

[TABLE]

Since $\bm{x}^{\ast}$ is $\gamma$ -Newton-stable, the projected Hessian $H_{p}(\bm{x}^{\ast})$ is full rank with smallest singular value

[TABLE]

By the continuity of $\sigma_{\min}(H_{p}(\bm{x}))$ in $\bm{x}$ , there exists an $\varepsilon_{1}>0$ such that $\sigma_{\min}(H_{p}(\bm{x}_{(0)}))\geq\gamma/2$ for all $\bm{x}$ with ${\|\bm{x}^{\ast}-\bm{x}\|}\leq\varepsilon_{1}$ . In particular, if ${\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}<\varepsilon_{1}$ , then $H_{p}(\bm{x}_{(0)})$ is invertible and the solution to (47) satisfies the following implicit equation in $\bm{z}^{*}$ ,

[TABLE]

Similarly, the unique solution to (25) is

[TABLE]

Subtracting the last two equations gives

[TABLE]

We bound the norm of $\tilde{\Delta}(\bm{x}_{(0)},\bm{u}^{*},\beta^{*})$ in (22) by

[TABLE]

To bound the terms in the sum, note that there exists an $M=M(\mathcal{T})<\infty$ such that for any $\bm{x}\in S_{n-1}$ , $\bm{u}\in\mathbb{R}^{n}$ and $j\leq m-1$ ,

[TABLE]

Bounding each term in the sum in (49) by (50), there are at most $m^{2}$ terms involving $M{\|\bm{u}^{*}\|}^{2}$ , and at most $2^{m}$ terms involving $M{\|\bm{u}^{*}\|}^{i}$ with $i\in\{3,\dots,m-1\}$ . Assuming ${\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}\leq m^{2}/2^{m}\leq 1$ and recalling that by (42), ${\|\bm{u}^{*}\|}\leq{\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}$ ,

[TABLE]

For the first term in (49), recalling the definition of $\beta^{*}$ in (19),

[TABLE]

For the first term in (51), note that $\alpha^{2}-1={\|\bm{u}^{*}\|}^{2}$ , $|\alpha|\leq 1$ and $|\lambda^{\ast}|\leq M$ , hence

[TABLE]

Since $\bm{x}^{\ast}$ is an eigenvector and $(\bm{u}^{*})^{T}\bm{x}^{\ast}=0$ , all terms in the sum in (51) with $j=1$ vanish,

[TABLE]

Bounding each term in the sum in (51) with $j\geq 2$ by (50), there are at most $m^{2}$ terms involving $M{\|\bm{u}^{*}\|}^{2}$ , and at most $2^{m}$ terms involving $M{\|\bm{u}^{*}\|}^{j}$ with $j\in\{3,\dots,m\}$ . Since ${\|\bm{u}^{*}\|}\leq m^{2}/2^{m}$ , the first term in (49) is thus bounded by

[TABLE]

It follows that

[TABLE]

Inserting this bound into (48),

[TABLE]

Inserting this into (45),

[TABLE]

Note that if ${\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}\leq\varepsilon_{2}=\min\{1,(4(\frac{8Mm^{2}}{\gamma}+1))^{-1/2}\}$ , then $\alpha\geq 1/2$ . By (42), $\|\bm{u}^{*}\|\leq\varepsilon_{2}$ as well. Thus, (52) implies $\|\bm{u}^{*}-\bm{u}_{(0)}\|\leq 1/4$ as required by Lemma 3. Under this condition, (43) implies

[TABLE]

*Combining the last two bounds we obtain *

[TABLE]

*As an interim summary, if ${\|\bm{x}^{\ast}-\bm{x}_{(0)}\|}\leq\min\{\varepsilon_{1},\varepsilon_{2},m^{2}/2^{m}\}=\varepsilon_{0}$ , then (53) holds. The rest of the proof follows by induction as in the proof of Theorem 1. *

Appendix D Proof of Proposition 1

First, we prove an auxiliary lemma concerning the structure of the eigenvectors of $\mathcal{T}_{\omega}$ . Recall $l=\lfloor n/2\rfloor$ . For any subset $\mathbb{A}\subseteq\{1,\ldots,l\}$ define the following two $n$ -dimensional vectors,

[TABLE]

Lemma 4

There is a function $\alpha(\omega,|\mathbb{A}|):\mathbb{R}\times\mathbb{N}\to\mathbb{R}$ such that all eigenvectors of $\mathcal{T}_{\omega}$ are of the form

[TABLE]

Proof 5

Let $\bm{x}^{\ast}=\sum_{i=1}^{n}\alpha_{i}\bm{e}_{i}$ be an eigenvector of $\mathcal{T}_{\omega}$ with eigenvalue $\lambda^{\ast}$ . To prove the lemma it suffices to show that the coefficients $\alpha_{1},\ldots,\alpha_{n}$ can attain at most two distinct values. Applying mode product to $\mathcal{T}_{\omega}$ with $\bm{x}^{\ast}$ ,

[TABLE]

where $\bar{\alpha}=\sum_{i=1}^{n}\alpha_{i}$ . Since $(\bm{x}^{\ast},\lambda^{\ast})$ is an eigenpair it satisfies,

[TABLE]

Multiplying both sides of (55) with $\bm{e}_{i}^{T}$ gives,

[TABLE]

Subtracting Equations (56) with $j\neq i$ ,

[TABLE]

*We thus conclude that for any $j\neq i$ either $\alpha_{j}=\alpha_{i}$ or $\alpha_{j}=\lambda^{\ast}-\alpha_{i}$ . It follows that the set $\{\alpha_{1},\ldots,\alpha_{n}\}$ contains up to $2$ distinct values satisfying (56). *

The first part of Proposition 1 determines the number of real eigenvectors for $\mathcal{T}_{\omega}$ . Following lemma 4, let $\bm{x}^{\ast}=\bm{1}_{\mathbb{A}^{c}}+\alpha\bm{1}_{\mathbb{A}}$ be proportional to some eigenvector of $\mathcal{T}_{\omega}$ . By Eq. (54),

[TABLE]

Since $\bm{x}^{\ast}=\alpha\bm{1}_{\mathbb{A}}+\bm{1}_{\mathbb{A}^{c}}$ is proportional to an eigenvector of $\mathcal{T}_{\omega}$ ,

[TABLE]

or equivalently,

[TABLE]

One solution to Eq. (57) is $\alpha=1$ , which corresponds to the eigenvector $\bm{x}=\frac{1}{\sqrt{n}}\bm{1}$ . For $\alpha\neq 1$ we replace $\bar{\alpha}$ with,

[TABLE]

The result is the following quadratic equation,

[TABLE]

The solutions to Eq. (58) determine, up to a normalizing factor, the eigenvectors (both real and complex) of $\mathcal{T}_{\omega}$ . Due to the problem’s symmetry we may charaterize all real eigenvectors by computing the solutions to (58) only for subsets $\mathbb{A}$ with $0\leq|\mathbb{A}|\leq l$ . Consider the discriminant $\mathcal{D}(\omega,|\mathbb{A}|)$ of the quadratic equation (58),

[TABLE]

For a given $\mathbb{A}$ , the number of real solutions to (58) is

[TABLE]

Hence, the number of real eigenpairs decreases at specific thresholds. The smallest threshold corresponds to $|\mathbb{A}|=l$ and is given by $\omega_{0}=\frac{1}{4l(n-l)}$ . When $\omega<\omega_{0}$ , there are $2$ real solutions to Eq. (58) for all subsets $1\leq|\mathbb{A}|\leq l$ . So the total number of solutions is equal to $2$ times the number of distinct subsets,

[TABLE]

where we add one to account for $\frac{1}{\sqrt{n}}\bm{1}$ , corresponding to $\mathbb{A}=\emptyset$ . Note that this is also the bound on the number of eigenvectors of a generic cubic tensor, see [5]. When $\omega=\omega_{0}$ , $\mathcal{D}(\omega_{0},|\mathbb{A}|=l)=0$ . In this case $N(\omega)$ is composed of two eigenvectors for all subsets $1\leq|\mathbb{A}|\leq l-1$ and one eigenvector for each subset of size $|\mathbb{A}|=l$ ,

[TABLE]

For $\omega_{0}<\omega<\omega_{1}$ , there are no real solutions of Eq. (58) for subsets of size $|\mathbb{A}|=l$ . The number of real solutions is therefore,

[TABLE]

Repeating the argument for increasing values of $\omega$ we obtain $N(\omega)$ as given in the proposition’s statement.

We now prove the second part of the proposition, stating that at the thresholds $\omega_{i}=\frac{1}{4(l-i)(n-l+i)}$ , $\binom{n}{l-i}$ of the eigenvectors are not Newton-stable. In this case $\mathcal{D}(\omega_{i},l-i)=0$ and only one (real) solution to (58) exists for each $\mathbb{A}$ with $|\mathbb{A}|=l-i$ . Solving (58) for $\omega=\omega_{i}$ , we find that the $\binom{n}{l-i}$ eigenpairs $(\bm{x}^{\ast}(\mathbb{A}),\lambda^{\ast}(\mathbb{A}))$ with $|\mathbb{A}|=l-i$ are

[TABLE]

We show that each such eigenpair is not Newton-stable. To do so, we prove that the projected Hessian $H_{p}(\bm{x}^{\ast})$ is rank deficient. First we compute the Hessian $H(\bm{x}^{\ast})$ . Abbreviate $b=\sqrt{\frac{1}{n|\mathbb{A}|(n-|\mathbb{A}|)}}$ and note that $\lambda^{\ast}=nb$ . Then,

[TABLE]

Consider the vector $\bm{v}=\bm{1}_{\mathbb{A}}-\bm{1}_{\mathbb{A}^{c}}$ . A simple calculation yields $\bm{v}^{T}H(\bm{x}^{\ast})\bm{v}=0$ . Since $\bm{v}$ is orthogonal to $\bm{x}^{\ast}$ , $H_{p}(\bm{x}^{\ast})$ is rank deficient and $(\bm{x}^{\ast},\lambda^{\ast})$ is not Newton-stable.

Appendix E Convergence to eigenvectors which are not Newton-stable

In this section we present a detailed empirical study of the convergence properties of O–NCM. As discussed in Section 4, the main property that governs the convergence of O–NCM to an eigenpair $(\bm{x}^{\ast},\lambda^{*})$ is the spectral structure of the projected Hessian at the eigenvector, $H_{p}(\bm{x}^{\ast})$ . As shown in Theorem 2, when $H_{p}(\bm{x}^{\ast})$ is full rank, O–NCM converges in a quadratic rate to $\bm{x}^{\ast}$ given a sufficiently close initial point. When $\bm{x}^{\ast}$ is isolated but $1\leq\text{rank}(H_{p}(\bm{x}^{\ast}))<n$ , the convergence rate may be less than quadratic. When $H_{p}(\bm{x}^{\ast})=0$ and/or $\bm{x}^{\ast}$ is non-isolated, full convergence to $\bm{x}^{\ast}$ is not always observed. These properties are summarized in table 1.

We illustrate these convergence properties via two examples.

(a)

Consider the tensor $\mathcal{T}$ with order $m=3$ and dimensionality $n=6$ of Example $5.8$ in [5], corresponding to the homogeneous polynomial

[TABLE]

This tensor has a total of $17$ real eigenpairs. Six of them correspond to a $\lambda=0$ eigenvalue, two of which are not Newton-stable with $\text{rank}(H_{p}(\bm{x}^{\ast}))=2$ . The rest are Newton-stable. Figure 9 shows the value of $\|\bm{x}_{(k)}-\bm{x}^{\ast}\|$ as a function of the iteration $k$ for one eigenvector that is Newton stable and one that is not. While the convergence to the stable eigenvector is quadratic, the convergence to the point which is not Newton-stable point is much slower.

(b)

Consider the tensor $\mathcal{T}\in\mathbb{R}^{6\times 6\times 6\times 6}$ of example $6.4$ in [20], corresponding to the homogeneous polynomial

[TABLE]

There are a total of $42$ isolated eigenvectors, including one that corresponds to an eigenvalue $\lambda=0$ . The projected Hessian $H_{p}(\bm{x}^{\ast})$ for this vector is equal to a zero matrix. As can be seen in Figure 9, in this case the O-NCM does not fully converge.

In addition, there are also infinitely many non-isolated eigenvectors corresponding to an eigenvalue $\lambda=4.5$ . The projected Hessian of these eigenpairs is a rank deficient (though non zero) matrix. For example, any vector of the form

[TABLE]

is proportional to a non-isolated eigenvector. Note that the vectors corresponding to (61) form a 2 dimensional subspace. Since these vectors are non-isolated, in this case we measure $\|(I_{n}-P_{\bm{x}^{\ast}})\bm{x}_{(k)}\|$ instead of $\|\bm{x}^{\ast}-\bm{x}_{(k)}\|$ where $P_{\bm{x}^{\ast}}\in R^{n\times n}$ is the projection matrix onto that subspace. As can be seen in Fig. 9 in this case the O-NCM does not converge.

Trivial eigenvectors

In some cases, the tensor fibers are spanned by a low dimension subspace. Any vector orthogonal to this subspace is an eigenvector corresponding to an eigenvalue $\lambda=0$ , and a projected Hessian $H_{p}(\bm{x}^{\ast})$ equal to a zero matrix. This is the case, for instance in example (b) where all fibers are orthogonal to $\bm{x}^{\ast}=[1\ldots,1]^{T}$ . As we have seen, this may cause the O-NCM to slowdown, since the iterations do not converge to these points.

A simple pre-processing step is to find these eigenvectors by calculating the subspace of the tensor fibers, namely $\mathcal{T}_{:,i_{2},\ldots,i_{m}},i_{2},\ldots,i_{m}\in[n]$ . As a second step, the O-NCM can easily be constrained to that subspace.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. , 15:2773–2832, 2014.
2[2] Animashree Anandkumar, Daniel Hsu, and Sham M Kakade. A method of moments for mixture models and hidden markov models. In Conference on Learning Theory , pages 33–1, 2012.
3[3] Åke Björck. Numerical methods for least squares problems . SIAM, 1996.
4[4] Paul Breiding. The average number of critical rank-one-approximations to a symmetric tensor. ar Xiv preprint ar Xiv:1701.07312 , 2017.
5[5] Dustin Cartwright and Bernd Sturmfels. The number of eigenvalues of a tensor. Linear Algebra Appl. , 438(2):942–952, 2013.
6[6] K. C. Chang, Kelly Pearson, and Tan Zhang. On eigenvalue problems of real symmetric tensors. J. Math. Anal. Appl. , 350(1):416–422, 2009.
7[7] Liping Chen, Lixing Han, and Liangmin Zhou. Computing tensor eigenvalues via homotopy methods. SIAM J. Matrix Anal. Appl. , 37(1):290–319, 2016.
8[8] Chun-Feng Cui, Yu-Hong Dai, and Jiawang Nie. All real eigenvalues of symmetric tensors. SIAM J. Matrix Anal. Appl. , 35(4):1582–1601, 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Newton correction methods for computing real eigenpairs of symmetric tensors

Abstract

Introduction

Notation

The symmetric tensor eigen-problem

Tensor eigenpairs

Definition 1

Power methods for computing tensor eigenpairs

Convergence, attraction regions, and stable eigenpairs

Newton-based methods for the tensor eigen-problem

4.1 Newton correction method

Convergence guarantees

Definition 2

Theorem 1

4.2 Orthogonal Newton correction method

Lemma 1

Convergence guarantees

Theorem 2

Remark 1

Convergence to eigenpairs with a rank deficient Hessian

Proposition 1

Simulation results

Finding all real eigenpairs

Small eigenvalues

Convergence rates

Discussion and summary

Runtime

Optimization point of view

Finding all eigenpairs of generic tensors

Proposition 2

Acknowledgments

Appendix A Convergence of NCM

Lemma 2

Proof 1

Proof 2** (Proof of Theorem 1)**

Appendix B Proof of Lemma 1

Appendix C Convergence of O–NCM

Lemma 3

Proof 3

Proof 4** (Proof of Theorem 2)**

Appendix D Proof of Proposition 1

Lemma 4

Proof 5

Appendix E Convergence to eigenvectors which are not Newton-stable

Trivial eigenvectors

Proof 2 (Proof of Theorem 1)

Proof 4 (Proof of Theorem 2)