The average condition number of most tensor rank decomposition problems   is infinite

Carlos Beltr\'an; Paul Breiding; Nick Vannieuwenhoven

arXiv:1903.05527·math.NA·July 2, 2024·Found. Comput. Math.

The average condition number of most tensor rank decomposition problems is infinite

Carlos Beltr\'an, Paul Breiding, Nick Vannieuwenhoven

PDF

TL;DR

This paper proves that the expected condition number for most tensor rank decompositions is infinite for higher ranks, highlighting the inherent computational complexity and implications for algorithm design.

Contribution

It establishes that the average condition number is infinite for most tensor rank decompositions of rank 3 or higher, revealing fundamental complexity issues.

Findings

01

Expected condition number is infinite for random rank-2 tensors under broad conditions.

02

Expected angular condition number is finite for rank-2 tensors.

03

Numerical experiments suggest higher ranks may also have finite angular condition numbers.

Abstract

The tensor rank decomposition, or canonical polyadic decomposition, is the decomposition of a tensor into a sum of rank-1 tensors. The condition number of the tensor rank decomposition measures the sensitivity of the rank-1 summands with respect to structured perturbations. Those are perturbations preserving the rank of the tensor that is decomposed. On the other hand, the angular condition number measures the perturbations of the rank-1 summands up to scaling. We show for random rank-2 tensors that the expected value of the condition number is infinite for a wide range of choices of the density. Under a mild additional assumption, we show that the same is true for most higher ranks $r \geq 3$ as well. In fact, as the dimensions of the tensor tend to infinity, asymptotically all ranks are covered by our analysis. On the contrary, we show that rank-2 tensors have finite expected angular…

Tables2

Table 1. Table 7.1. Results of sampling GITs in σ r ; n 1 , n 2 , n 3 ⊂ ℝ n 1 × n 2 × n 3 subscript 𝜎 𝑟 subscript 𝑛 1 subscript 𝑛 2 subscript 𝑛 3 superscript ℝ subscript 𝑛 1 subscript 𝑛 2 subscript 𝑛 3 \sigma_{r;n_{1},n_{2},n_{3}}\subset\mathbb{R}^{n_{1}\times n_{2}\times n_{3}} via an acceptance–rejection method. Columns three to five list the number of samples where the final tracked solution of the homotopy was real, complex, or failed, respectively. The next column shows the fraction of successful samples that were real; in the case of n × n × 2 𝑛 𝑛 2 n\times n\times 2 the analytical solution from [ 13 ] is also stated and the correct digits from the empirical estimate are underlined. The final column indicates the total wall-clock time required to perform the Monte Carlo experiments.

$n_{1} \times n_{2} \times n_{3}$	$r$	samples			fraction in $ℝ$	time (min)
$n_{1} \times n_{2} \times n_{3}$	$r$	$ℝ$	$ℂ$	failed	fraction in $ℝ$	time (min)
$2 \times 2 \times 2$	$2$	$100, 000$	$27, 335$	$41$	$\underline{0.785} 3 \dots \approx \frac{π}{4}$	$1.3$
$3 \times 3 \times 2$	$3$	$100, 000$	$101, 345$	$185$	$\underline{0.49} 66 \dots \approx \frac{1}{2}$	$2.8$
$4 \times 4 \times 2$	$4$	$100, 000$	$288, 770$	$325$	$\underline{0.25} 72 \dots \approx \frac{27 π^{2}}{1024}$	$14.9$
$5 \times 4 \times 3$	$6$	$100, 000$	$1, 237, 912$	$643$	$0.0747 \dots$	$420.6$
$5 \times 5 \times 2$	$5$	$100, 000$	$810, 254$	$509$	$\underline{0.10} 98 \dots \approx \frac{1}{9}$	$99.3$

Table 2. Table 7.2. Estimated parameters of the exponential model 7.1 fitted to the complementary cumulative distribution functions from Figure 7.1 . The coefficient of determination R 2 superscript 𝑅 2 R^{2} between the log-transformed data and log-transformed model predictions is also indicated.

$n_{1} \times n_{2} \times n_{3}$	regular			angular
$n_{1} \times n_{2} \times n_{3}$	$a$	$b$	$R^{2}$	$a$	$b$	$R^{2}$
$2 \times 2 \times 2$	$0.6624$	$0.6904$	$0.9999$	$1.7288$	$1.8624$	$0.9995$
$3 \times 3 \times 2$	$2.2348$	$0.6636$	$0.9999$	$5.5496$	$1.8856$	$0.9998$
$4 \times 4 \times 2$	$4.7318$	$0.6388$	$0.9997$	$11.4165$	$1.8455$	$0.9994$
$5 \times 4 \times 3$	$22.3141$	$0.6461$	$0.9998$	$102.4887$	$1.6337$	$0.9992$
$5 \times 5 \times 2$	$9.8634$	$0.6436$	$0.9997$	$23.6951$	$1.8662$	$0.9996$

Equations510

\mathpzc A := (a_{i_{1}, \dots, i_{d}})_{1 \leq i_{1} \leq n_{1}, \dots, 1 \leq i_{d} \leq n_{d}} \in R^{n_{1} \times \dots \times n_{d}} .

\mathpzc A := (a_{i_{1}, \dots, i_{d}})_{1 \leq i_{1} \leq n_{1}, \dots, 1 \leq i_{d} \leq n_{d}} \in R^{n_{1} \times \dots \times n_{d}} .

(u^{1} \otimes \dots \otimes u^{d})_{i_{1}, \dots, i_{d}} := u_{i_{1}}^{(1)} \dots u_{i_{d}}^{(d)}, where u^{j} = [u_{i}^{(j)}]_{1 \leq i \leq n_{j}} .

(u^{1} \otimes \dots \otimes u^{d})_{i_{1}, \dots, i_{d}} := u_{i_{1}}^{(1)} \dots u_{i_{d}}^{(d)}, where u^{j} = [u_{i}^{(j)}]_{1 \leq i \leq n_{j}} .

\mathpzc A = i = 1 \sum r \mathpzc A_{i}, where \mathpzc A_{i} = u_{i}^{1} \otimes \dots \otimes u_{i}^{d} has rank one for each 1 \leq i \leq d .

\mathpzc A = i = 1 \sum r \mathpzc A_{i}, where \mathpzc A_{i} = u_{i}^{1} \otimes \dots \otimes u_{i}^{d} has rank one for each 1 \leq i \leq d .

π \in S_{r} min i = 1 \sum r ∥ \mathpzc A_{i} - \mathpzc A_{π_{i}}^{'} ∥^{2} \leq κ (\mathpzc A) ∥ \mathpzc A - \mathpzc A^{'} ∥ + o (∥ \mathpzc A - \mathpzc A^{'} ∥)

π \in S_{r} min i = 1 \sum r ∥ \mathpzc A_{i} - \mathpzc A_{π_{i}}^{'} ∥^{2} \leq κ (\mathpzc A) ∥ \mathpzc A - \mathpzc A^{'} ∥ + o (∥ \mathpzc A - \mathpzc A^{'} ∥)

π \in S_{r} min i = 1 \sum r \frac{\mathpzc A _{i}}{∥ \mathpzc A _{i} ∥} - \frac{\mathpzc A _{π_{i}}^{'}}{∥ \mathpzc A _{π_{i}}^{'} ∥}^{2} \leq κ_{ang} (\mathpzc A) ∥ \mathpzc A - \mathpzc A^{'} ∥ + o (∥ \mathpzc A - \mathpzc A^{'} ∥)

π \in S_{r} min i = 1 \sum r \frac{\mathpzc A _{i}}{∥ \mathpzc A _{i} ∥} - \frac{\mathpzc A _{π_{i}}^{'}}{∥ \mathpzc A _{π_{i}}^{'} ∥}^{2} \leq κ_{ang} (\mathpzc A) ∥ \mathpzc A - \mathpzc A^{'} ∥ + o (∥ \mathpzc A - \mathpzc A^{'} ∥)

\mathpzc U_{i} = \frac{\mathpzc A _{i}}{∥ \mathpzc A _{i} ∥}, for i = 1, \dots, r,

\mathpzc U_{i} = \frac{\mathpzc A _{i}}{∥ \mathpzc A _{i} ∥}, for i = 1, \dots, r,

\mathpzc A = i = 1 \sum r λ_{i} \mathpzc U_{i}

\mathpzc A = i = 1 \sum r λ_{i} \mathpzc U_{i}

κ [g \circ f] (x) := ∥ (d_{f (x)} g) (d_{x} f) ∥ \leq ∥ d_{f (x)} g ∥∥ d_{x} f ∥ = κ [g] (f (x)) κ [f] (x),

κ [g \circ f] (x) := ∥ (d_{f (x)} g) (d_{x} f) ∥ \leq ∥ d_{f (x)} g ∥∥ d_{x} f ∥ = κ [g] (f (x)) κ [f] (x),

σ_{r; n_{1}, \dots, n_{d}}^{C} := {\mathpzc A \in C^{n_{1} \times \dots \times n_{d}} ∣ rank_{C} (\mathpzc A) \leq r} .

σ_{r; n_{1}, \dots, n_{d}}^{C} := {\mathpzc A \in C^{n_{1} \times \dots \times n_{d}} ∣ rank_{C} (\mathpzc A) \leq r} .

r \leq r_{n_{1}, \dots, n_{d}}^{crit}, where r_{n_{1}, \dots, n_{d}}^{crit} := \frac{n _{1} \dots n _{d}}{1 + \sum _{k = 1}^{d} ( n _{k} - 1 )} .

r \leq r_{n_{1}, \dots, n_{d}}^{crit}, where r_{n_{1}, \dots, n_{d}}^{crit} := \frac{n _{1} \dots n _{d}}{1 + \sum _{k = 1}^{d} ( n _{k} - 1 )} .

S_{n_{1}, \dots, n_{d}} = {a^{1} \otimes \dots \otimes a^{d} ∣ a^{k} \in R^{n_{k}} \ {0}} .

S_{n_{1}, \dots, n_{d}} = {a^{1} \otimes \dots \otimes a^{d} ∣ a^{k} \in R^{n_{k}} \ {0}} .

Φ : S_{n_{1}, \dots, n_{d}} \times \dots \times S_{n_{1}, \dots, n_{d}} \to R^{n_{1} \times \dots \times n_{d}}, (\mathpzc A_{1}, \dots, \mathpzc A_{r}) \mapsto \mathpzc A_{1} + \dots + \mathpzc A_{r} .

Φ : S_{n_{1}, \dots, n_{d}} \times \dots \times S_{n_{1}, \dots, n_{d}} \to R^{n_{1} \times \dots \times n_{d}}, (\mathpzc A_{1}, \dots, \mathpzc A_{r}) \mapsto \mathpzc A_{1} + \dots + \mathpzc A_{r} .

κ (\mathpzc A) := ϵ \to 0 lim ∥Δ \mathpzc A ∥ < ϵ s.t. \mathpzc A + Δ \mathpzc A \in σ_{r; n_{1}, \dots, n_{d}} sup \frac{∥ Φ _{\mathpzc a}^{- 1} ( \mathpzc A ) - Φ _{\mathpzc a}^{- 1} ( \mathpzc A + Δ \mathpzc A ) ∥}{∥Δ \mathpzc A ∥} = ∥ d_{\mathpzc A} Φ_{a}^{- 1} ∥_{2},

κ (\mathpzc A) := ϵ \to 0 lim ∥Δ \mathpzc A ∥ < ϵ s.t. \mathpzc A + Δ \mathpzc A \in σ_{r; n_{1}, \dots, n_{d}} sup \frac{∥ Φ _{\mathpzc a}^{- 1} ( \mathpzc A ) - Φ _{\mathpzc a}^{- 1} ( \mathpzc A + Δ \mathpzc A ) ∥}{∥Δ \mathpzc A ∥} = ∥ d_{\mathpzc A} Φ_{a}^{- 1} ∥_{2},

ρ (\mathpzc A) := (C_{r; n_{1}, \dots, n_{d}})^{- 1} e^{- \frac{∥ \mathpzc A ∥ ^{2}}{2}}, where C_{r; n_{1}, \dots, n_{d}} = \int_{σ_{r; n_{1}, \dots, n_{d}}} e^{- \frac{∥ \mathpzc A ∥ ^{2}}{2}} d \mathpzc A

ρ (\mathpzc A) := (C_{r; n_{1}, \dots, n_{d}})^{- 1} e^{- \frac{∥ \mathpzc A ∥ ^{2}}{2}}, where C_{r; n_{1}, \dots, n_{d}} = \int_{σ_{r; n_{1}, \dots, n_{d}}} e^{- \frac{∥ \mathpzc A ∥ ^{2}}{2}} d \mathpzc A

s_{1} =

s_{1} =

s_{2} =

2 \leq r < (1 - ϵ_{n_{1}, \dots, n_{d}}) r_{n_{1}, \dots, n_{d}}^{crit},

2 \leq r < (1 - ϵ_{n_{1}, \dots, n_{d}}) r_{n_{1}, \dots, n_{d}}^{crit},

κ_{ang} (\mathpzc A) := ϵ \to 0 lim ∥Δ \mathpzc A ∥ < ϵ, \mathpzc A + Δ \mathpzc A \in σ_{r; n_{1}, \dots, n_{d}} sup \frac{∥ ( p ^{\times r} \circ Φ ^{- 1} a ) ( \mathpzc A ) - ( p ^{\times r} \circ Φ ^{- 1} a ) ( \mathpzc A + Δ \mathpzc A ) ∥}{∥Δ \mathpzc A ∥},

κ_{ang} (\mathpzc A) := ϵ \to 0 lim ∥Δ \mathpzc A ∥ < ϵ, \mathpzc A + Δ \mathpzc A \in σ_{r; n_{1}, \dots, n_{d}} sup \frac{∥ ( p ^{\times r} \circ Φ ^{- 1} a ) ( \mathpzc A ) - ( p ^{\times r} \circ Φ ^{- 1} a ) ( \mathpzc A + Δ \mathpzc A ) ∥}{∥Δ \mathpzc A ∥},

Σ := 1 + k = 1 \sum d (n_{k} - 1) and Π := k = 1 \prod d n_{k};

Σ := 1 + k = 1 \sum d (n_{k} - 1) and Π := k = 1 \prod d n_{k};

(U_{1} \otimes \dots \otimes U_{d}) (u^{1} \otimes \dots \otimes u^{d}) = (U_{1} u^{1}) \otimes \dots \otimes (U_{d} u^{d}) .

(U_{1} \otimes \dots \otimes U_{d}) (u^{1} \otimes \dots \otimes u^{d}) = (U_{1} u^{1}) \otimes \dots \otimes (U_{d} u^{d}) .

S (U) := {\frac{u}{∥ u ∥} ∣ u \in U ∖ {0}} \subset V .

S (U) := {\frac{u}{∥ u ∥} ∣ u \in U ∖ {0}} \subset V .

∥ R ∥_{2} := v \in R^{n} max \frac{∥ R v ∥}{∥ v ∥} and ς_{m i n} (R) := v \in R^{n} min \frac{∥ R v ∥}{∥ v ∥} .

∥ R ∥_{2} := v \in R^{n} max \frac{∥ R v ∥}{∥ v ∥} and ς_{m i n} (R) := v \in R^{n} min \frac{∥ R v ∥}{∥ v ∥} .

q (R) := ς_{1} (R) \dots ς_{n - 1} (R) = \frac{det ( R ^{T} R )}{ς _{m i n} ( R )},

q (R) := ς_{1} (R) \dots ς_{n - 1} (R) = \frac{det ( R ^{T} R )}{ς _{m i n} ( R )},

\left\{\mathbf{v}\in\mathbb{R}^{N}\;|\;\exists\text{ a smooth curve }\gamma(t)\subset\mathcal{M}\text{ with }\gamma(0)=x:\mathbf{v}=\frac{\mathrm{d}{}}{\mathrm{d}{}t}\Big{|}_{t=0}\,\gamma(t)\right\}.

\left\{\mathbf{v}\in\mathbb{R}^{N}\;|\;\exists\text{ a smooth curve }\gamma(t)\subset\mathcal{M}\text{ with }\gamma(0)=x:\mathbf{v}=\frac{\mathrm{d}{}}{\mathrm{d}{}t}\Big{|}_{t=0}\,\gamma(t)\right\}.

S_{n_{1}, \dots, n_{d}} = {u^{1} \otimes \dots \otimes u^{d} ∣ u^{k} \in R^{n_{k}} \ {0}} .

S_{n_{1}, \dots, n_{d}} = {u^{1} \otimes \dots \otimes u^{d} ∣ u^{k} \in R^{n_{k}} \ {0}} .

T_{u^{1} \otimes \dots \otimes u^{d}} S_{n_{1}, \dots, n_{d}} = R^{n_{1}} \otimes u^{2} \otimes \dots \otimes u^{d} + \dots + u^{1} \otimes \dots \otimes u^{d - 1} \otimes R^{n_{d}};

T_{u^{1} \otimes \dots \otimes u^{d}} S_{n_{1}, \dots, n_{d}} = R^{n_{1}} \otimes u^{2} \otimes \dots \otimes u^{d} + \dots + u^{1} \otimes \dots \otimes u^{d - 1} \otimes R^{n_{d}};

⟨ u^{1} \otimes \dots \otimes u^{d}, v^{1} \otimes \dots \otimes v^{d} ⟩ = i = 1 \prod d ⟨ u^{i}, v^{i} ⟩ .

⟨ u^{1} \otimes \dots \otimes u^{d}, v^{1} \otimes \dots \otimes v^{d} ⟩ = i = 1 \prod d ⟨ u^{i}, v^{i} ⟩ .

σ_{r; n_{1}, \dots, n_{d}} = {\mathpzc A \in R^{n_{1} \times \dots \times n_{d}} ∣ rank (\mathpzc A) \leq r};

σ_{r; n_{1}, \dots, n_{d}} = {\mathpzc A \in R^{n_{1} \times \dots \times n_{d}} ∣ rank (\mathpzc A) \leq r};

T_{\mathpzc A} N_{r, n_{1}, \dots, n_{d}} = T_{\mathpzc A_{1}} S_{n_{1}, \dots, n_{d}} + \dots + T_{\mathpzc A_{r}} S_{n_{1}, \dots, n_{d}}, for \mathpzc A = \mathpzc A_{1} + \dots + \mathpzc A_{r} .

T_{\mathpzc A} N_{r, n_{1}, \dots, n_{d}} = T_{\mathpzc A_{1}} S_{n_{1}, \dots, n_{d}} + \dots + T_{\mathpzc A_{r}} S_{n_{1}, \dots, n_{d}}, for \mathpzc A = \mathpzc A_{1} + \dots + \mathpzc A_{r} .

κ (\mathpzc A) = \frac{1}{ς _{m i n} ([ U _{1} , \dots , U _{r} ])} .

κ (\mathpzc A) = \frac{1}{ς _{m i n} ([ U _{1} , \dots , U _{r} ])} .

κ (\mathpzc a) := \frac{1}{ς _{m i n} ([ U _{1} , \dots , U _{r} ])},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The average condition number of most tensor rank decomposition problems is infinite

Carlos Beltrán

,

Paul Breiding

and

Nick Vannieuwenhoven

Abstract.

The tensor rank decomposition, or canonical polyadic decomposition, is the decomposition of a tensor into a sum of rank-1 tensors. The condition number of the tensor rank decomposition measures the sensitivity of the rank-1 summands with respect to structured perturbations. Those are perturbations preserving the rank of the tensor that is decomposed. On the other hand, the angular condition number measures the perturbations of the rank-1 summands up to scaling.

We show for random rank-2 tensors that the expected value of the condition number is infinite for a wide range of choices of the density. Under a mild additional assumption, we show that the same is true for most higher ranks $r\geq 3$ as well. In fact, as the dimensions of the tensor tend to infinity, asymptotically all ranks are covered by our analysis. On the contrary, we show that rank-2 tensors have finite expected angular condition number. Based on numerical experiments, we conjecture that this could also be true for higher ranks.

Our results underline the high computational complexity of computing tensor rank decompositions. We discuss consequences of our results for algorithm design and for testing algorithms computing tensor rank decompositions.

CB: Universidad de Cantabria, [email protected]. Supported by Spanish “Ministerio de Economía y Competitividad” under projects MTM2017-83816-P and MTM2017-90682-REDT (Red ALAMA), as well as by the Banco Santander and Universidad de Cantabria under project 21.SI01.64658.

PB: Universität Osnabrück, [email protected].

NV: KU Leuven, Department of Computer Science, [email protected]. Supported by the Postdoctoral Fellowship of the Research Foundation–Flanders (FWO) with project numbers 12E8116N and 12E8119N

1. Introduction

1.1. The condition number of tensor rank decomposition

In this article, a tensor is a multidimensional array filled with numbers:

[TABLE]

The integer $d$ is called the order of $\mathpzc{A}$ . The tensor product of $d$ vectors $\mathbf{u}^{1}\in\mathbb{R}^{n_{1}},\ldots,\mathbf{u}^{d}\in\mathbb{R}^{n_{d}}$ is defined to be the tensor $\mathbf{u}^{1}\otimes\cdots\otimes\mathbf{u}^{d}\in\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ with entries

[TABLE]

Any nonzero multidimensional array obeying this relation is called a rank-1 tensor. Not every multidimensional array represents a rank- $1$ tensor, but every tensor $\mathpzc{A}$ is a finite linear combination of rank- $1$ tensors:

[TABLE]

Hitchcock [50] coined the name polyadic decomposition for the decomposition 1.1. The smallest number $r$ for which $\mathpzc{A}$ admits an expression as in 1.1 is called the (real) rank of $\mathpzc{A}$ . A corresponding minimal decomposition is called a canonical polyadic decomposition (CPD).

For instance, in algebraic statistics [1, 59], chemical sciences [67], machine learning [4], psychometrics [54], signal processing [35, 36, 64], or theoretical computer science [26], the input data has the structure of a tensor and the CPD of this tensor reveals the information of interest. Usually, this data is subject to measurement errors, which will cause the CPD computed from the measured data to differ from the CPD of the true data. In numerical analysis, the sensitivity of the model parameters, such as the rank- $1$ summands in the CPD, to perturbations of the data is often quantified by the condition number [61].

When there are multiple CPDs of a tensor $\mathpzc{A}$ , the condition number must be defined at a decomposition $\{\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r}\}$ . However, in this article, we will restrict our analysis to tensors $\mathpzc{A}$ having a unique decomposition. Such tensors are called identifiable. In this case, the condition number of the tensor rank decomposition of a tensor $\mathpzc{A}$ is well-defined, and we denote it by $\kappa(\mathpzc{A})$ . We will explain in Section 1.3 below in greater detail the notion of identifiablity of tensors. At this point, the reader should mainly bear in mind that the assumption of being identifiable is comparably weak as most tensors of low rank satisfy it. However, note that matrices ( $d=2$ ) are never identifiable, so we assume that the order of the tensor is $d\geq 3$ .

The condition number of tensor rank decomposition was characterized in [20], and it is the condition number of the following computational problem: On input $\mathpzc{A}\in\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ of rank $r$ , compute the set of rank-1 terms $\{\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r}\}$ in the decomposition 1.1. This condition number measures the sensitivity of the rank- $1$ terms with respect to perturbations of the tensor $\mathpzc{A}$ . In other words, when the condition number $\kappa(\mathpzc{A})$ of the rank- $r$ identifiable tensor $\mathpzc{A}=\sum_{i=1}^{r}\mathpzc{A}_{i}$ in 1.1 is finite, it is the smallest value $\kappa(\mathpzc{A})$ such that

[TABLE]

holds for all rank- $r$ tensors $\mathpzc{A}^{\prime}=\sum_{i=1}^{r}\mathpzc{A}_{i}^{\prime}$ (with $\mathpzc{A}_{i}^{\prime}$ of rank $1$ ) sufficiently close to $\mathpzc{A}$ . Herein, the norm on $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ is the usual Euclidean norm, and $\mathfrak{S}_{r}$ is the permutation group on $\{1,\ldots,r\}$ . It was shown in [24, Corollary 5.5] that the same expression holds if $\mathpzc{A}^{\prime}$ is any tensor close to $\mathpzc{A}$ and $\sum_{i=1}^{r}\mathpzc{A}_{i}^{\prime}$ is the best rank- $r$ approximation of $\mathpzc{A}^{\prime}$ in the Euclidean norm.

As a general principle in numerical analysis, the condition number is an intrinsic property of the computational problem that governs the forward error and attainable precision of any method for solving the problem. Its study is also useful for other purposes. For example, in [21, 22] the local rate of convergence of Riemannian Gauss–Newton optimization methods for computing the CPD was related to the condition number $\kappa(\mathpzc{A})$ .

A conventional wisdom in numerical analysis is that it is harder to compute the condition number of a given problem instance than solving the problem itself [39, 38]. This viewpoint led Smale to initiate the study of the probability distribution of condition numbers: If the condition number is small with high probability, then for many practical purposes one can assume that any given input is well-conditioned; at least the probability of failure necessarily will be small. Smale started studying the probability that a polynomial is ill-conditioned [66]. This strategy was extended to linear algebra condition numbers [41, 31, 27], to systems of polynomial equations in diverse settings [62, 42], to linear systems of inequalities [49], to linear and convex programming [68, 2], eigenvalue and eigenvectors in the classic and other settings [7], to polynomial eigenvalue problems [9, 6], and to other computational models [30], among others. As there is a substantive bibliography on this setting, we refer the reader to [29] for further details.

Tensor rank decomposition seems to be no exception to this wisdom: The characterization of $\kappa(\mathpzc{A})$ for a given $\mathpzc{A}$ in [20] requires the CPD of $\mathpzc{A}$ itself. This forces us to rely on probabilistic studies to establish reasonable a priori values of the condition number. Settling this is the main purpose of this paper.

1.2. Informal version of our main results and discussion.

The first probabilistic analyses of the condition number of CPD were given in [8, 23]. In those references the expected value was computed for random rank- $1$ tensors; that is, for random output of the computational problem of computing CPDs. This amounts to choosing random $\mathbf{u}_{i}^{k}$ in the notation above, constructing the corresponding tensor $\mathpzc{A}$ and studying $\kappa(\mathpzc{A})$ . The probabilistic study is feasible, in principle, because one can obtain a closed expression for $\kappa(\mathpzc{A})$ which is polynomial in terms of the $\mathbf{u}_{i}^{k}$ , so that the question boils down to an explicit but nontrivial integration problem.

This article is the first to investigate the condition number for random input. That is, we assume that $\mathpzc{A}$ is chosen at random within the set of rank- $r$ tensors (see the definition of random tensors in Definition 1.3 and the extension in Theorem 1.11) and we wonder about the expected value of $\kappa(\mathpzc{A})$ . The difficulty now is that, even if we assume that a decomposition (1.1) exists, we do not have it and hence we lack a closed expression for $\kappa(\mathpzc{A})$ .

One may wonder if these two different random procedures should give similar distributions in this or other numerical problems. The answer is no. For example, say that our problem is to compute the kernel of a given matrix $A\in\mathbb{R}^{n\times(n+1)}$ and we want to study the expected value of the associated condition number $\|A\|\,\|A^{\dagger}\|$ . Choosing $A$ at random produces $\mathbb{E}(\|A\|\,\|A^{\dagger}\|)<\infty$ but choosing the kernel at random and then $A$ at random within the matrices with that kernel is the same as computing the expected value of the usual Turing’s condition number of a square real Gaussian matrix, which is infinity; see [31] for precise estimations of these quantities. The situation is similar in the study of systems of homogeneous polynomial equations: random inputs have better condition number than inputs produced from random outputs; see for example [11]. In both these examples, the condition number of input constructed from random output is, on average, larger than the condition number of random input. This is a stroke of luck since in general one expects instances from practical, real life problems, to be somehow random within the input space, not to have a random output!

In this paper we show that computing the CPD is a rara avis: We prove in Theorem 1.5 and Theorem 1.6 that (under suitable hypotheses) the condition number of random input tensors turns out to be infinity. On the contrary, by [8, 23] it is presumed that the average condition number is finite when choosing random output. This result reinforces the evidence that computing CPDs is a very challenging computational problem.

The literature often cites the result of Håstad [53] to underline the high computational complexity of computing CPDs. Håstad showed that the NP-complete 3-satisfiability problem (also called 3-SAT) can be reduced to computing the rank of a tensor; hence, solving the tensor rank decomposition problem is NP-hard in the Turing machine computational model. Our main result is different in two aspects: first, Håstad showed the difficulty of only one particular instance of a CPD, whereas we show that computing the CPD is difficult on average. Second, our evidence supporting the hardness of the problem is not based on Turing machine complexity, but given by analyzing the condition number, which is more appropriate for numerical computations [16]. Linking complexity analyses to condition numbers is common in the literature; for instance, in the case of solving polynomial systems [56, 11, 28, 63]. In general, the book [29] provides a good overview. In this interpretation, we show that computing CPDs numerically is hard on average.

On the other hand, in the literature, the main result of de Silva and Lim [37] is often cited as a key reason why approximating a tensor by a low-rank CPD is such a challenging problem: for some input tensors, a best low-rank approximation may not exist! This is because the set of tensors of bounded rank is not closed: There are tensors of rank strictly greater than $r$ that can be approximated arbitrarily well by rank- $r$ tensors. It is shown in [37] that this ill-posedness of the approximation problem is not rare in the sense that for every tensor space $\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}$ there exists an open set of input tensors which do not admit a best rank- $2$ approximation. This result is stronger than Håstad’s in the sense that it proves that instances with no solution to the tensor rank approximation problem may occur on an open set, rather than in one particular set of measure zero. Notwithstanding this key result, it does not tell us about the complexity of solving the tensor rank decomposition problem, in which we are given a rank- $r$ tensor whose CPD we seek. In this setting, there are no ill-posed inputs in the sense of [37]. It was already shown in [20] that the condition number diverges as one moves towards the open part of the boundary of tensors of bounded rank, entailing that there exist regions with arbitrarily high condition number. One of the main result of this paper, Theorem 1.6, shows that such regions cannot be ignored: They are sufficiently large to cause the integral of the condition number over the set of rank- $r$ tensors to diverge. In other words, one cannot neglect the regions where the condition number is so high that a CPD computed from a floating-point representation of a rank- $r$ tensor in $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ , subject only to roundoff errors, is meaningless—a result similar in spirit to de Silva and Lim [37].

One may conclude from the above that, at least from the point of view of average stability of the problem, tensor rank decomposition is doomed to fail. However, if one only cares about the directions of the rank- $1$ terms in the decomposition, then the situation changes dramatically. The condition number associated with the computational problem “Given a rank- $r$ identifiable tensor $\mathpzc{A}=\sum_{i=1}^{r}\mathpzc{A}_{i}$ as in 1.1, output the set of normalized rank- $1$ tensors $\{\frac{\mathpzc{A}_{i}}{\|\mathpzc{A}_{1}\|},\ldots,\frac{\mathpzc{A}_{r}}{\|\mathpzc{A}_{r}\|}\}$ ” will be called the angular condition number $\kappa_{\mathrm{ang}}(\mathpzc{A})$ . Analogously to the bound 1.2, one can show that when $\kappa_{\mathrm{ang}}$ is finite, it is the smallest number such that

[TABLE]

for all rank- $r$ tensors $\mathpzc{A}^{\prime}=\sum_{i=1}^{r}\mathpzc{A}_{i}^{\prime}$ (with $\mathpzc{A}_{i}^{\prime}$ rank- $1$ tensors) in a sufficiently small open neighborhood of $\mathpzc{A}$ . By [24, Corollary 5.5] the same expression holds for all tensors $\mathpzc{A}^{\prime}$ in a small open neighborhood of $\mathpzc{A}$ if $\sum_{i=1}^{r}\mathpzc{A}_{i}^{\prime}$ is the best rank- $r$ approximation of $\mathpzc{A}^{\prime}$ .

We will prove in Theorem 1.9 that at least in the case of rank- $2$ tensors, the angular condition number $\kappa_{\mathrm{ang}}$ for random inputs is finite, contrary to the classic condition number $\kappa$ ; in fact, the numerical experiments in Section 7 suggest that this finite average condition seems to extend to much higher ranks as well. In other words, on average we may expect to be able to recover the angular part of the CPD:

[TABLE]

where $\mathpzc{A}_{i}$ is as in 1.1. One could conclude from this that a tensor decomposition algorithm should aim to produce the normalized rank- $1$ terms $\mathpzc{U}_{i}$ from the tensor rank decomposition

[TABLE]

accurately. Once these terms are obtained, one can recover the $\lambda_{i}$ ’s by solving a linear system of equations. Since, as a general principle, the condition number of a composite smooth map $g\circ f$ between manifolds satisfies [16, 29]

[TABLE]

it follows that the condition number of tensor decomposition is bounded by the product of the condition numbers of the problem of finding the angular part of the CPD and the condition number of solving a linear least-squares problem. Our main results suggest that precisely the last problem will on average be ill-conditioned.

The foregoing observation can have major implications for algorithm design. Indeed, solving the tensor rank decomposition problem by first solving for the angular part and then the linear least-squares problem decomposes the problem into a nonlinear and a linear part. Crucially, the latter least-squares problem can be solved by direct methods, such as a QR-factorization combined with a linear system solver. Such methods have a uniform computational cost regardless of the condition number of the problem. By contrast, since no (provably) numerically stable direct algorithms for tensor rank decomposition are currently known [8], iterative methods are indispensable for this problem. We may expect their computational performance to depend on the condition number of the problem instance. Indeed, our main results combined with the main result of [21] imply, for example, that Riemannian Gauss–Newton optimization methods for solving the angular part of the CPD should, on average, require less iterations to reach convergence than Riemannian Gauss–Newton methods for solving the tensor decomposition problem directly (such as the methods in [21, 22]), because the angular condition number $\kappa_{\mathrm{ang}}$ appears to be finite on average, while the regular condition number $\kappa$ is proved to be $\infty$ on average in most cases, as we show in this article.

Our main results also have consequences for researchers testing numerical algorithms for computing the CPD. In the literature, a common way of generating input data for testing algorithms is to sample the rank- $1$ terms $\mathpzc{A}_{i}=\lambda_{i}\mathbf{u}_{i}^{1}\otimes\mathbf{u}_{i}^{2}\otimes\cdots\otimes\mathbf{u}_{i}^{d}$ randomly, and then apply the algorithm to the associated tensor $\mathpzc{A}=\sum_{i=1}^{r}\mathpzc{A}_{i}$ . However, our analysis in this paper and the analyses in [8, 23] show that this procedure generates tensors that are heavily biased towards being numerically well-conditioned. Hence, this way of testing algorithms probably does not correspond to a realistic distribution on the inputs. We acknowledge that it is currently not easy to sample rank- $r$ tensors uniformly even though some methods exist [18]. In part, this is because equations for the algebraic variety containing the tensors of rank bounded by $r$ are hard to obtain [57]. Nevertheless, in Section 7, using the observation from Remark 1.4, we present an acceptance-rejection method that can be applied to a few cases and yields uniformly distributed rank- $r$ tensors, relative to the Gaussian density in Definition 1.3. In any case we strongly advocate that the (range of) condition numbers are reported when testing the performance of iterative methods for solving the tensor rank decomposition problem, so that one can assess the difficulty of the problem instances. We believe it is always recommended to include models that are known to lead to instances with high condition numbers, such as those used in [20, 22].

The formal presentation of our main results requires some extra notation that we introduce in subsequent sections.

1.3. Identifiable tensors and a formula for the condition number

A particular feature of higher-order tensors that distinguishes them from matrices is identifiability. This means that in many cases the CPD of tensors of order $d\geq 3$ of small rank is unique. A tensor $\mathpzc{A}\in\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ is called $r$ -identifiable if there is a unique set $\{\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r}\}$ of cardinality $r$ such that $\mathpzc{A}=\mathpzc{A}_{1}+\cdots+\mathpzc{A}_{r}$ and all $\mathpzc{A}_{i}$ ’s are rank- $1$ tensors. A celebrated criterion by Kruskal [55] gives a tool to decide if a given tensor of order 3 satisfies this property.

Lemma 1.1 (Kruskal’s criterion [55, 65]).

Let $\mathbb{F}$ be $\mathbb{R}$ or $\mathbb{C}$ , $\mathpzc{A}\in\mathbb{F}^{n_{1}\times n_{2}\times n_{3}}$ a tensor of order $3$ and assume that $\mathpzc{A}=\sum_{i=1}^{r}\mathpzc{A}_{i},$ where $\mathpzc{A}_{i}=\lambda_{i}\mathbf{u}_{i}^{1}\otimes\mathbf{u}_{i}^{2}\otimes\mathbf{u}_{i}^{3}\in\mathbb{F}^{n_{1}\times n_{2}\times n_{3}}.$ Define the factor matrices $U_{\ell}=[\mathbf{u}_{i}^{\ell}]_{1\leq i\leq r}\in\mathbb{F}^{n_{\ell}\times r}$ for $\ell=1,2,3$ , and let $k_{\ell}$ be the largest integer $k$ such that every subset of $k$ columns of $U_{\ell}$ has rank equal to $k$ . If $r\leq\frac{1}{2}(k_{1}+k_{2}+k_{3}-2)$ and $k_{1},k_{2},k_{3}>1$ , then the tensor $\mathpzc{A}$ is $r$ -identifiable over $\mathbb{F}$ .

Since matrix rank does not change with a field extension from $\mathbb{R}$ to $\mathbb{C}$ , a real rank- $r$ tensor $\mathpzc{A}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}$ that satisfies the assumptions of Lemma 1.1 is $r$ -identifiable over $\mathbb{R}$ and also automatically $r$ -identifiable over $\mathbb{C}$ . In other words, Kruskal’s criterion is certifying complex $r$ -identifiability of tensors, which is a strictly stronger notion than $r$ -identifiability over $\mathbb{R}$ [5].

Most order 3 tensors of low-rank satisfy Kruskal’s criterion [34]: There is an open dense subset of the set of rank- $r$ tensors in $\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}$ , $n_{1}\geq n_{2}\geq n_{3}\geq 2$ , where complex $r$ -identifiability holds, provided $r\leq n_{1}+\min\{\tfrac{1}{2}\delta,\delta\}$ with $\delta:=n_{2}+n_{3}-n_{1}-2$ . In fact, this phenomenon occurs much more generally than third-order tensors of very small rank. Let us denote the set of complex tensors of complex rank bounded by $r$ by

[TABLE]

This constructible111The elements of $\sigma_{r;n_{1},\ldots,n_{d}}^{\mathbb{C}}$ can be parameterized as in 1.1 changing $\mathbb{R}$ to $\mathbb{C}$ . set turns out to be an open dense subset (in the Euclidean topology) of its Zariski closure $\overline{\sigma_{r;n_{1},\ldots,n_{d}}^{\mathbb{C}}}$ ; see [57]. One says that $\sigma_{r;n_{1},\ldots,n_{d}}^{\mathbb{C}}$ is generically complex $r$ -identifiable if the subset of points of $\sigma_{r;n_{1},\ldots,n_{d}}^{\mathbb{C}}$ that are not complex $r$ -identifiable is contained in a proper closed subset in the Zariski topology on the algebraic variety $\overline{\sigma_{r;n_{1},\ldots,n_{d}}^{\mathbb{C}}}$ ; see [32]. It is known from dimensionality arguments [32] that there is a maximum value of $r$ for which generic $r$ -identifiability of $\sigma_{r;n_{1},\ldots,n_{d}}$ can hold, namely

[TABLE]

In fact, it is conjectured that the inequality is strict in general; see [47] for details. For all other values of $r$ , generic $r$ -identifiability does not hold. In [17, 32, 33, 40] it is proved that in the majority of choices for $n_{1},\ldots,n_{d}$ , generic complex $r$ -identifiability holds for most ranks with $r<r_{\text{crit}}$ ; see [17, Theorem 7.2] for a result that is asymptotically optimal. For a summary of the conjecturally complete picture of complex $r$ -identifiability results, see [34, Section 3].

Assumption 1.

In the rest of this article, we will assume that $\sigma_{r;n_{1},\ldots,n_{r}}^{\mathbb{C}}$ is generically complex $r$ -identifiable.

The reason why we make this assumption is because it greatly simplifies some of the arguments. At the same time, Assumption 1 is (conjectured to be) extremely weak and only limits the generality in the exceptional cases listed in [33, Theorem 1.1], and even then generic $r$ -identifiability only fails very close to the upper bound $r_{\text{crit}}$ of the permitted ranks.

An immediate benefit of Assumption 1 is that it allows for a nice expression of the condition number of the tensor rank decomposition problem. Let us denote the set of rank-1 tensors in $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ by

[TABLE]

It is a smooth manifold, called the Segre manifold [46, 57]. The set of tensors of rank bounded by $r$ is the image of the addition map: $\sigma_{r;n_{1},\ldots,n_{d}}=\Phi(\mathcal{S}_{n_{1},\ldots,n_{d}}^{\times r})$ , where

[TABLE]

Then, under Assumption 1, there exists an open dense subset $\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ of $\sigma_{r;n_{1},\ldots,n_{d}}$ such that for all $\mathpzc{A}\in\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ we have $|\Phi^{-1}(\mathpzc{A})|=r!$ by [8, Proposition 4.5–4.7].222The preimage of an $r$ -identifiable tensor under the map $\Phi$ consists of the $r!$ permutations of the summands. In particular, the points in the fiber are isolated, so there is a local inverse map $\Phi^{-1}_{\mathpzc{a}}$ of $\Phi$ for each $\mathpzc{a}\in\Phi^{-1}(\mathpzc{A})$ . Recall from [20] that the condition number of the CPD at $\mathpzc{A}\in\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ is then the condition number (in the classic sense of Rice [61]; see also [69, 29]) of any of these local inverses:

[TABLE]

where $\mathpzc{a}\in\Phi^{-1}(\mathpzc{A})$ is arbitrary; it is a corollary of [20, Theorem 1.1] that the above definition does not depend on the choice of $\mathpzc{a}$ . Herein, $\|\cdot\|$ in the denominator is the Euclidean norm induced by the ambient $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ , and the norm in the numerator is the product norm of the Euclidean norms inherited from the ambient $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ ’s. The right-hand side $\|\mathrm{d}{}_{\mathpzc{A}}\Phi_{a}^{-1}\|_{2}$ is the spectral norm of the derivative of $\Phi_{a}^{-1}$ at $\mathpzc{A}$ . See Section 2 for more details. By [20, Proposition 4.4], the condition number $\kappa(\mathpzc{A})$ does not depend on the norm of $\mathpzc{A}$ : $\kappa(t\mathpzc{A})=\kappa(\mathpzc{A})$ for $t\in\mathbb{R}\setminus\{0\}$ .

*Remark 1.2**.*

We did not specify the value of the condition number for $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}\setminus\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ . The main reason is that our analysis is independent of the values that the condition number takes on this set of measure zero, so that for simplicity we decided against including the more complicated general case where there can be several distinct elements in the preimage.

1.4. Main results

The goal of this paper is to study the average condition number relative to “reasonable” density functions. By this we mean probability distributions $\hat{\rho}$ that are comparable to the standard Gaussian density $\rho$ : There exist positive constants $c_{1},c_{2}$ such that $c_{1}\leq\frac{\hat{\rho}}{\rho}\leq c_{2}$ . The main result, Theorem 1.11, applies, among others, for all distributions $\hat{\rho}$ comparable to the following Gaussian density defined on the set of bounded rank tensors $\sigma_{r;n_{1},\ldots,n_{d}}$ .

Definition 1.3 (Gaussian Identifiable Tensors).

We define a random variable $\mathpzc{A}$ on $\sigma_{r;n_{1},\ldots,n_{d}}$ by specifying its density as

[TABLE]

is the normalization constant. Under Assumption 1, if $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}$ and $\mathpzc{A}\sim\rho$ , we say that $\mathpzc{A}$ is a Gaussian Identifiable Tensor (GIT) of rank $r$ .

*Remark 1.4**.*

Suppose that $r$ is a typical rank of tensors in $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ . This means that $\sigma_{r;n_{1},\ldots,n_{d}}$ contains a Euclidean open subset of $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ and is of maximum dimension $n_{1}\cdots n_{d}$ . Then, the distribution defined in Definition 1.3 is a conditional probability distribution: A GIT $\mathpzc{A}$ of rank $r$ has the distribution $\mathpzc{A}\sim(\mathpzc{B}\mid\mathrm{rank}(\mathpzc{B})=r)$ , where $\mathpzc{B}$ is a tensor with independent and identically distributed (i.i.d.) standard Gaussian entries. We exploit this fact in our numerical experiments to sample GITs using an acceptance-rejection method.

We first state our results for the foregoing Gaussian density. At the end of this subsection, in Theorem 1.11, we generalize these results to other densities, including all densities comparable to the Gaussian density. Our first contribution is the following result. We prove it in Section 3.

Theorem 1.5.

Let $\mathpzc{A}\in\sigma_{2;n_{1},\ldots,n_{d}}$ be a GIT of rank $r=2$ . Then, $\operatorname*{\mathbb{E}}\kappa(\mathpzc{A})=\infty.$

It should be mentioned that in our analysis we consider a small subset of $\sigma_{2;n_{1},\ldots,n_{d}}$ and show that on this subset the condition number integrates to infinity. In particular, a weak average-case analysis as proposed in [3] would be of interest in this problem.

Under one additional assumption we can extend the result from Theorem 1.5 to higher ranks. We prove the following theorem in Section 4.

Theorem 1.6.

Let $n_{1},\ldots,n_{d}\geq 3$ . On top of Assumption 1 we assume that $\sigma_{r-2,n_{1}-2,\ldots,n_{r}-2}$ is generically complex identifiable. Then, for a GIT $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}$ , $r\geq 3$ , we have $\operatorname*{\mathbb{E}}\kappa(\mathpzc{A})=\infty.$

By [17, Theorem 7.2], the assumptions of Theorem 1.6 are satisfied in a large number of cases. In fact, as the size of the tensor increases, the assumptions become weaker: When $n_{1}\geq n_{2}\geq\cdots\geq n_{d}\geq 2$ the conditions in Theorem 1.6 are satisfied for $r\leq\min(s_{1},s_{2})$ with

[TABLE]

Note that for large $n_{i}$ , the second piece $s_{2}$ is the most restrictive. From 1.3 it is implied that $r_{n_{1}-2,\ldots,n_{d}-2}^{\textrm{crit}}=(1-\delta_{n_{1},\ldots,n_{d}}){r_{n_{1},\ldots,n_{d}}^{\textrm{crit}}}$ with $\delta_{n_{1},\ldots,n_{d}}=\mathcal{O}(\sum_{k=1}^{d}\frac{1}{n_{k}})$ . Therefore, we obtain the following asymptotically optimal result.

Corollary 1.7.

Let $d\geq 3$ be fixed, and $n_{1}\geq n_{2}\geq\cdots\geq n_{d}\geq 2$ . If $n_{1},\ldots,n_{d}\to\infty$ , then for a GIT $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}$ we have $\mathbb{E}\,\kappa(\mathpzc{A})=\infty$ for all

[TABLE]

where $\lim_{n_{1},\ldots,n_{d}\to\infty}\epsilon_{n_{1},\ldots,n_{d}}\to 0.$

It follows from dimensionality arguments that if $r>r_{\text{crit}}$ , then the addition map $\Phi$ does not have a local inverse. In fact, in this case all of the connected components in the fiber of $\Phi$ at $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}$ have positive dimension [46]. It follows from [20] that the condition number of the tensor rank decomposition problem at each expression 1.1 of length $r$ of such a tensor $\mathpzc{A}$ is $\infty$ . In this case, $\kappa(\mathpzc{A})=\infty$ , regardless of how the tensor decomposition problem is defined333This is exactly the concern of Remark 1.2: What computational problem are we interested in solving when a tensor has several distinct CPDs? Are we interested in the CPD with the best sensitivity? Or the worst? Or the expected condition number of one randomly chosen CPD in the fiber? This depends on the context. The results of this paper are valid regardless of the particular variation of the problem one is interested in. when $\mathpzc{A}$ has multiple distinct decompositions; see also the discussion in [29, Remark 14.14]. In this case the average condition number is infinite, as well.

Our results lead us to the conjecture that the expected condition number is infinite, also without making the assumption from Theorem 1.6 and without any upper bound on the rank.

Conjecture 1.8.

Let $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}$ be a GIT of rank $r\geq 2$ . Then, $\operatorname*{\mathbb{E}}\kappa(\mathpzc{A})=\infty.$

Corollary 1.7 above proves this conjecture asymptotically, in practice leaving only a small range of ranks for which it might fail.

As mentioned above, it turns out that for GITs the expected angular condition number is not always infinite. Formally, the angular condition number is defined as follows: Let the canonical projection onto the sphere be $p:\mathbb{R}^{n_{1}\times\cdots\times n_{d}}\to\mathbb{S}(\mathbb{R}^{n_{1}\times\cdots\times n_{d}})$ . Then the angular condition number of $\mathpzc{A}\in\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ is

[TABLE]

where $\Phi^{-1}{a}$ is an arbitrary local inverse of $\Phi$ with $\mathpzc{A}=\Phi(\mathpzc{a})$ . As before we do not specify what happens on the measure-zero set $\sigma_{r;n_{1},\ldots,n_{d}}\setminus\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ , because it is not relevant for this paper. The angular condition number only accounts for the angular part of the CPD, i.e., the directions of the tensors, not for their magnitude, hence the name.

To distinguish the condition numbers 1.5 and 1.6, we will refer to the condition number from 1.5 as the regular condition number. Oftentimes we even drop the clarification “regular”.

Here is the result for $\kappa_{\mathrm{ang}}(\mathpzc{A})$ for tensors of rank two that we prove in Section 5.

Theorem 1.9.

Let $\mathpzc{A}\in\sigma_{2;n_{1},\ldots,n_{d}}$ be a GIT of rank 2. Then, $\operatorname*{\mathbb{E}}\kappa_{\mathrm{ang}}(\mathpzc{A})<\infty$ .

Unfortunately, we do not know if this theorem can be extended to higher rank tensors. However, based on our experiments in Section 7, we pose the following:

Conjecture 1.10.

Let $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}$ be a GIT of rank r. Then, $\operatorname*{\mathbb{E}}\kappa_{\mathrm{ang}}(\mathpzc{A})<\infty$ .

We finally observe that the foregoing main results are not limited to GITs. They are valid for a wide range of distributions of random tensors.

Theorem 1.11.

Theorems 1.5, 1.6, Corollary 1.7 and Theorem 1.9 are still true if instead of GITs we take random tensors defined by a wide range of other probability distributions, including some of interest such as:

(1)

All probability distributions that are comparable to the standard Gaussian density $\rho$ . This means that the random tensor $\mathpzc{A}$ has a density $\hat{\rho}$ for which there exists positive constants $c_{1},c_{2}$ such that $c_{1}\leq\frac{\hat{\rho}}{\rho}\leq c_{2}$ . 2. (2)

Uniformly randomly chosen $\mathpzc{A}$ in the unit sphere $\mathbb{S}(\sigma_{r})$ . 3. (3)

Uniformly randomly chosen $\mathpzc{A}$ in the unit ball $\{\mathpzc{A}\in\sigma_{r}:\|\mathpzc{A}\|\leq 1\}$ .

1.5. Acknowledgements

Part of this work was made while the second and third author were visiting the Universidad de Cantabria, supported by the funds of Grant 21.SI01.64658 (Banco Santander and Universidad de Cantabria), Grant MTM2017-83816-P from the Spanish Ministry of Science, and the FWO Grant for a long stay abroad V401518N. We thank these institutions for their support. We also thank two anonymous referees for helpful comments.

1.6. Organization of the article

The rest of the article is organized as follows. In the next section we give some preliminary material. Thereafter, in Sections 3, 4, 5 and 6, we successively prove Theorem 1.5, Theorem 1.6, Theorem 1.9 and Theorem 1.11. In Section 7 we present numerical experiments supporting our main results. Finally, in Appendix A, B and C we give proofs for several lemmata that we need in the other sections.

2. Notation and Preliminaries

2.1. Notation

We will use the following typographic conventions for convenience: Vectors are typeset in a bold face ( $\mathbf{a},\mathbf{b}$ ), matrices in upper case ( $A$ , $B$ ), tensors in a calligraphic font ( $\mathpzc{A}$ , $\mathpzc{B}$ ), and manifolds and linear spaces in a different calligraphic font ( $\mathcal{A},\mathcal{B}$ ).

The positive integer $d\geq 2$ is reserved for the order of a tensor, $n_{1},\ldots,n_{d}\geq 2$ are its dimensions, and $r\geq 1$ is its rank. The following integers are used throughout the paper:

[TABLE]

they correspond to the dimension of the Segre manifold $\mathcal{S}_{n_{1},\ldots,n_{d}}$ and the dimension of the ambient space $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ respectively. The symmetric group on $r$ elements is denoted by $\mathfrak{S}_{r}$ .

We work exclusively with real vector spaces, for which $\langle\cdot,\cdot\rangle$ denotes the Euclidean inner product and $\|\cdot\|$ always denotes the associated norm. We will switch freely between the finite-dimensional vector spaces $\mathbb{R}^{n_{1}\cdots n_{d}}$ and $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ for representing tensors in the abstract vector space $\mathbb{R}^{n_{1}}\otimes\cdots\otimes\mathbb{R}^{n_{d}}$ . By the above choice of norms all of these finite-dimensional Hilbert spaces are isometric; specifically, if $\mathpzc{A}\in\mathbb{R}^{n_{1}}\otimes\cdots\otimes\mathbb{R}^{n_{d}}$ and $\mathbf{a}\in\mathbb{R}^{n_{1}\cdots n_{d}}$ is its coordinate array with respect to an orthogonal basis, then $\|\mathpzc{A}\|=\|\mathbf{a}\|$ . Similarly, if the coordinates $\mathbf{a}$ are reshaped into a multidimensional array $A\in\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ , then $\|A\|=\|\mathpzc{A}\|=\|\mathbf{a}\|$ . It is important to note that this notation can conflict with the usual meaning of $\|A\|$ when $d=2$ ; to distinguish the spectral norm from the standard norm in this paper, we write $\|A\|_{2}$ for the former; see 2.1.

For matrices $U_{1}\in\mathbb{R}^{m_{1}\times n_{1}},\ldots,U_{d}\in\mathbb{R}^{m_{d}\times n_{d}}$ , the tensor product $U_{1}\otimes\cdots\otimes U_{d}$ acts on rank- $1$ tensors as follows:

[TABLE]

By the universal property [44], this extends to a linear map $\mathbb{R}^{n_{1}}\otimes\cdots\otimes\mathbb{R}^{n_{d}}\to\mathbb{R}^{m_{1}}\otimes\cdots\otimes\mathbb{R}^{m_{d}}$ . Note that we can view $U_{1}\otimes\cdots\otimes U_{d}$ as a matrix in $\mathbb{R}^{(m_{1}\cdots m_{d})\times(n_{1}\cdots n_{d})}$ .

For any subset $U\subset V$ of a normed vector space $V$ , we define the sphere over $U$ as

[TABLE]

In particular, the unit sphere in $\mathbb{R}^{n}$ is denoted by $\mathbb{S}(\mathbb{R}^{n})$ .

Given an $m\times n$ matrix $R$ or a linear operator $R:\mathbb{R}^{n}\to\mathbb{R}^{m}$ , we denote the pseudo-inverse by $R^{\dagger}$ . The spectral norm and smallest singular value of $R$ are denoted respectively by

[TABLE]

A special role will be played in this paper by the product of all but the smallest singular values of $R$ , which we denote by $q(R)$ . In other words, if $R$ is injective, then

[TABLE]

where $R^{T}$ is the transposed matrix (operator) and $\varsigma_{i}(R)$ is the $i$ th largest singular value of $R$ .

2.2. Differential geometry

In this article we only consider submanifolds of Euclidean spaces; see, e.g., [58] for the general definitions. A smooth ( $C^{\infty}$ ) manifold is a topological manifold with a smooth structure, in the sense of [58]. The tangent space $\mathrm{T}_{x}{\,\mathcal{M}}$ at $x$ to an embedded $n$ -dimensional smooth submanifold $\mathcal{M}\subset\mathbb{R}^{N}$ is the set

[TABLE]

At every point $x\in\mathcal{M}$ , there exist open neighborhoods $\mathcal{V}\subset\mathcal{M}$ and $\mathcal{U}\subset\mathrm{T}_{x}{\,\mathcal{M}}$ of $x$ , and a bijective smooth map $\phi:\mathcal{V}\to\mathcal{U}$ with smooth inverse. The tuple $(\mathcal{V},\phi)$ is a coordinate chart of $\mathcal{M}$ . A smooth map between manifolds $F:\mathcal{M}\to\mathcal{N}$ is a map such that for every $x\in\mathcal{M}$ and coordinate chart $(\mathcal{V},\phi)$ containing $x$ , and every coordinate chart $(\mathcal{W},\psi)$ containing $F(x)$ , we have that $\psi\circ F\circ\phi^{-1}:\phi(\mathcal{U})\to\psi(F(\mathcal{U}))$ is a smooth map. The derivative of $F$ can be defined as the linear map $\mathrm{d}{}_{x}F:\mathrm{T}_{x}{\,\mathcal{M}}\to\mathrm{T}_{F(x)}{\,\mathcal{N}}$ taking the tangent vector $\mathbf{v}\in\mathrm{T}_{x}{\,\mathcal{M}}$ to $\frac{\mathrm{d}{}}{\mathrm{d}{}t}|_{t=0}F(\gamma(t))\in\mathrm{T}_{F(x)}{\,\mathcal{N}}$ where $\gamma(t)\subset\mathcal{M}$ is a curve with $\gamma(0)=x$ and $\gamma^{\prime}(0)=\mathbf{v}$ . If $\dim\mathcal{M}=\dim\mathcal{N}$ and if $\mathrm{d}{}_{x}F$ has full rank, there is a neighborhood $\mathcal{W}\subset\mathcal{M}$ on which $F$ is invertible and its inverse is also smooth; that is, $F$ is a diffeomorphism between $\mathcal{W}$ and $F(\mathcal{W})$ . If this property holds for all $x\in\mathcal{M}$ , then $F$ is called a local diffeomorphism.

A differentiable submanifold $\mathcal{M}\subset\mathbb{R}^{N}$ can be equipped with a Riemannian metric $g$ , turning it into a Riemannian manifold, allowing for the computation of integrals. The manifolds in this paper are all embedded submanifolds of Euclidean space, so the Riemannian metric for us will always be the metric inherited from the ambient space.

2.3. The manifold of $r$ -nice tensors

As in the introduction, the Segre manifold is

[TABLE]

It is a smooth manifold of dimension $\Sigma$ . Its tangent space is given by

[TABLE]

note that this is not a direct sum.

The Euclidean inner product between rank-1 tensors is conveniently computed by the following formula (see, e.g., [45]):

[TABLE]

The set of tensors of rank at most $r$ is denoted by

[TABLE]

it is a semialgebraic set of dimension at most $\min\{r\Sigma,\Pi\}$ ; see, e.g., [60]. Under Assumption 1 the dimension of $\sigma_{r;n_{1},\ldots,n_{d}}$ is exactly $r\Sigma$ .

In [8, Section 4] we introduced an open dense subset of $\sigma_{r;n_{1},\ldots,n_{d}}$ with favorable differential-geometric properties. We called it the manifold of $r$ -nice tensors in [8, Definition 4.2]. Below, we present a slightly modified definition that is suitable for our present purpose; it eliminates conditions $(4)$ and $(5)$ from [8, Definition 4.2].

In what follows, we denote the real closure in the Zariski topology of a subset $A\subset\mathbb{R}^{\Pi}$ by $\overline{A}$ . This is the real algebraic variety $\overline{A}:=\overline{A}^{\mathbb{C}}\cap\mathbb{R}^{\Pi}$ , where $\overline{A}^{\mathbb{C}}$ is the closure of $A$ in the Zariski topology in $\mathbb{C}^{\Pi}$ . By [70, Lemma 8], the real dimension of $\overline{A}$ equals the complex dimension of $\overline{A}^{\mathbb{C}}$ .

Definition 2.1.

Recall the addition map $\Phi$ defined in 1.4. Let $\mathcal{M}_{r;n_{1},\ldots,n_{d}}\subset(\mathcal{S}_{n_{1},\ldots,n_{d}})^{\times r}$ be the subset of $r$ -tuples $\mathpzc{a}:=(\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r})$ of rank-1 tensors satisfying all of the following properties:

(1)

$\Phi(\mathpzc{a})$ is a smooth point of the algebraic variety $\overline{\sigma_{r;n_{1},\ldots,n_{d}}}$ ; 2. (2)

$\Phi(\mathpzc{a})$ is complex $r$ -identifiable; and 3. (3)

$\kappa(\Phi(\mathpzc{a}))<\infty$ .

The set of $r$ -nice tensors is $\mathcal{N}_{r;n_{1},\ldots,n_{d}}:=\Phi(\mathcal{M}_{r;n_{1},\ldots,n_{d}})$ .

Remark that the third item in the definition is well defined because of the second item.

Proposition 2.2.

If Assumption 1 holds, then the following statements are true:

(1)

$\mathcal{M}_{r;n_{1},\ldots,n_{d}}$ * and $\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ are smooth manifolds of dimension $r\Sigma$ ;* 2. (2)

$\mathcal{M}_{r;n_{1},\ldots,n_{d}}$ * is Zariski-open in $(\mathcal{S}_{n_{1},\ldots,n_{d}})^{\times r}$ ;* 3. (3)

$\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ * is Zariski-open in $\sigma_{r;n_{1},\ldots,n_{d}}$ ;* 4. (4)

the addition map $\Phi{\mid_{\mathcal{M}_{r;n_{1},\ldots,n_{d}}}}$ is a global diffeomorphism onto its image; 5. (5)

$\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ * is closed under multiplication by nonzero scalars; and* 6. (6)

$\mathcal{M}_{r;n_{1},\ldots,n_{d}}\subset\mathbb{R}^{n_{1}\times\cdots\times n_{d}}\times\cdots\times\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ * and $\mathcal{N}_{r;n_{1},\ldots,n_{d}}\subset\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ are embedded submanifolds.*

Proof.

Items 1, 2, 3, and 6 are proved as follows. Let $X_{1}$ and $X_{2}$ be respectively the set of tensors in $\sigma_{r;n_{1},\ldots,n_{d}}$ which are not complex $r$ -identifiable and which are not smooth points of $\overline{\sigma_{r;n_{1},\ldots,n_{d}}}$ . Both are Zariski-closed in $\overline{\sigma_{r;n_{1},\ldots,n_{d}}}$ under Assumption 1, and hence so are the preimages $\Phi^{-1}(X_{1})$ and $\Phi^{-1}(X_{2})$ . Moreover, the third defining condition of $\mathcal{M}_{r;n_{1},\ldots,n_{d}}$ is also Zariski-closed in $(\mathcal{S}_{n_{1},\ldots,n_{d}})^{\times r}$ from the explicit formula for the condition number 2.6 below. Hence, $\mathcal{M}_{r;n_{1},\ldots,n_{d}}$ is Zariski-open. An open subset of an embedded submanifold is itself an embedded submanifold so the claim for $\mathcal{M}_{r;n_{1},\ldots,n_{d}}$ is proved. Moreover, the dimension of the complement of $\mathcal{M}_{r;n_{1},\ldots,n_{d}}$ is at most $r\Sigma-1$ and so its image by the rational map $\Phi$ is contained in an algebraic set of dimension at most $r\Sigma-1$ , thus proving that $\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ is also Zariski-open and indeed an embedded submanifold of the set of smooth points of $\overline{\sigma_{r;n_{1},\ldots,n_{d}}}$ , which is itself an embedded submanifold of its affine ambient space, see [12, Proposition 3.2.9].

The fourth item is due to the definition of the condition number, the fact that it is finite on $\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ by Definition 2.1, and the injectivity of $\Phi|_{\mathcal{M}_{r;n_{1},\ldots,n_{d}}}$ by Definition 2.1 (2).

The fifth item follows by noting that the three defining properties of $\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ are all true independent of a nonzero scaling. ∎

*Remark 2.3**.*

The definition of $r$ -nice tensors in [8, Definition 4.2] involves two more requirements, but those are not needed here.

Since the tangent space of $\mathcal{N}_{r;n_{1},\ldots,n_{d}}$ at a point is the image of the derivative of the local diffeomorphism $\Phi$ , we have the following characterization:

[TABLE]

2.4. Sensitivity of CPDs

The condition number of the problem of computing the rank- $1$ terms of a CPD of a tensor was studied in a general setting in [20]; the following characterization of the condition number is Theorem 1.1 of [20]. Let $\mathpzc{A}=\mathpzc{A}_{1}+\cdots+\mathpzc{A}_{r}\in\mathcal{N}_{r,n_{1},\ldots,n_{d}}$ , where the $\mathpzc{A}_{i}\in\mathcal{S}_{n_{1},\ldots,n_{d}}$ are rank- $1$ tensors. For each $i$ let $U_{i}$ be a matrix whose columns form an orthonormal basis of $\mathrm{T}_{\mathpzc{A}_{i}}\mathcal{S}_{n_{1},\ldots,n_{d}}$ . Then,

[TABLE]

The matrix $U=[U_{1},\ldots,U_{r}]\in\mathbb{R}^{\Pi\times r\Sigma}$ is also called a Terracini matrix. An explicit expression for the $U_{i}$ ’s is given in [20, equation (5.1)].

Since $\mathpzc{A}$ uniquely depends on $\mathpzc{a}:=(\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r})\in\mathcal{S}_{n_{1},\ldots,n_{d}}^{\times r}$ , we can view the condition number of $\mathpzc{A}\in\mathcal{N}_{r,n_{1},\ldots,n_{d}}$ as a function of $\mathpzc{a}$ :

[TABLE]

where the matrices $U_{i}$ are as before. The benefit of 2.7 is that it is well-defined for any tuple $\mathpzc{a}\in\mathcal{S}_{n_{1},\ldots,n_{d}}^{\times r}$ (and not just those mapping into $\mathcal{N}_{r,n_{1},\ldots,n_{d}}$ ).

2.5. Integrals

For fixed $t\in(0,1]$ and a point $\mathbf{y}\in\mathbb{S}(\mathbb{R}^{n})$ , the spherical cap of radius $t$ around $\mathbf{y}$ is defined as $\mathrm{cap}(\mathbf{y},t):=\{\mathbf{x}\in\mathbb{S}(\mathbb{R}^{n}):{\langle\mathbf{x},\mathbf{y}\rangle\;>\sqrt{1-t^{2}}}\}$ . Its volume satisfies

[TABLE]

for some positive constants $0<c_{1}{(n)}<c_{2}{(n)}$ .

The following general lemma will be useful later.

Lemma 2.4.

Let $u,v>0$ be fixed. Then, $0<\int_{0}^{\infty}t^{u}\,e^{-\frac{(t+v)^{2}}{2}}\,\mathrm{d}{}t<\infty.$

Proof.

It is clear that the integral is not zero. Furthermore, since $(t+v)^{2}>t^{2}+v^{2}$ for $t,v>0$ , we see that $\int_{0}^{\infty}t^{u}\,e^{-\frac{(t+v)^{2}}{2}}\,\mathrm{d}{}t\leq\int_{0}^{\infty}t^{u}\,e^{-\frac{t^{2}+v^{2}}{2}}\,\mathrm{d}{}t=e^{-\frac{v^{2}}{2}}\sqrt{2}^{u-1}\Gamma(\frac{u+1}{2})$ , which is finite. ∎

2.6. The coarea formula

Let $\mathcal{M}$ and $\mathcal{N}$ be submanifolds of $\mathbb{R}^{n}$ of equal dimension, and let $F:\mathcal{M}\to\mathcal{N}$ be a smooth surjective map. A point $y\in\mathcal{N}$ is called a regular value of $F$ if for all points $x\in F^{-1}(y)$ the differential $\mathrm{d}{}_{x}F$ is of full rank. The preimage $F^{-1}(y)$ of a regular value $y$ is a discrete set of points. Let $|F^{-1}(y)|$ be the number of elements in this preimage. Then, the coarea formula [52] states that for every integrable function $g$ we have

[TABLE]

where $\mathrm{Jac}(F)(x):=|\det\mathrm{d}{}_{x}F|$ is the Jacobian determinant of $F$ at $x$ . Note that almost all $y\in\mathcal{N}$ are regular values of $F$ by Sard’s theorem [58, Theorem 6.10]. Hence, integrating over $\mathcal{N}$ is the same as integrating over all regular values of $F$ .

*Remark 2.5**.*

In [52], the coarea formula is given in the more general case when $\dim\mathcal{M}\geq\dim\mathcal{N}$ . In this article we only need the case when the dimension of $\dim\mathcal{M}$ and $\dim\mathcal{N}$ coincide. Moreover, if $F$ is injective, then 2.9 reduces to the well-known change-of-variables formula.

3. The average condition number of Gaussian tensors of rank two

The goal of this section is to prove Theorem 1.5. We will proceed in three steps. First, the $2$ -nice tensors are conveniently parameterized via elementary manifolds such as one-dimensional intervals and spheres in Section 3.1. Second, the Jacobian determinant of this map is computed in Section 3.2. Third, the integral can be bounded from below with the help of a few technical auxiliary lemmas in Section 3.3. In the next section, we will exploit Theorem 1.5 for generalizing the argument to most higher ranks. To simplify notation, in this section we let

[TABLE]

3.1. Parameterizing $2$ -nice tensors

Let

[TABLE]

and consider the next parametrization of the Segre manifold:

[TABLE]

The preimage of $\mathpzc{A}\in\mathcal{S}$ has cardinality $|\psi^{-1}(\mathpzc{A})|=2^{d-1}$ . By composing $\Psi:=\psi\times\psi$ with the addition map from 1.4 we get the following alternative representation of tensors of rank bounded by $2$ :

[TABLE]

We would like to apply the coarea formula 2.9 to pull back the integral of $\kappa(\mathpzc{A})e^{-\frac{\|\mathpzc{A}\|}{2}}$ over $\sigma_{2}$ via the parametrization $\Phi\circ\Psi$ . However, $\sigma_{2}$ in general is not a manifold, so the formula does not apply. Nevertheless, we can use the manifold $\mathcal{N}_{2}$ of $2$ -nice tensors instead. By Proposition 2.2 (3), $\mathcal{N}_{2}$ is Zariski open in $\sigma_{2}$ , so that

[TABLE]

where $C_{2}:=C_{2;n_{1},\ldots,n_{d}}$ is as in Definition 1.3. By applying the coarea formula 2.9 to the smooth map $\Phi\mid_{\mathcal{M}_{2}}$ we get

[TABLE]

where $\mathrm{Jac}(\Phi)(\mathpzc{A}_{1},\mathpzc{A}_{1})$ is the Jacobian determinant of $\Phi$ at $(\mathpzc{A}_{1},\mathpzc{A}_{1})$ . In the first equality we used $|\Phi^{-1}(\mathpzc{A})|=2$ for $2$ -identifiable tensors; indeed, we have that $\Phi(\mathpzc{A}_{1},\mathpzc{A}_{2})=\Phi(\mathpzc{A}_{2},\mathpzc{A}_{1})=\mathpzc{A}$ and $\mathpzc{A}_{1}\neq\mathpzc{A}_{2}$ because $\mathpzc{A}\in\mathcal{N}_{2}$ has rank equal to $2$ .

In the following, we switch to the notation from 2.7: $\kappa(\mathpzc{A}_{1}+\mathpzc{A}_{2})=\kappa(\mathpzc{A}_{1},\mathpzc{A}_{2})$ . Since $\mathcal{M}_{2}$ is also Zariski open in $\mathcal{S}\times\mathcal{S}$ by Proposition 2.2 (2), we may replace the integral over $\cal{M}_{2}$ by an integral over $\mathcal{S}\times\mathcal{S}$ , thus obtaining

[TABLE]

We use the coarea formula again, but this time for $\Psi=\psi\times\psi$ , where $\psi$ is the parametrization from 3.2. Note that for $(\mathpzc{A}_{1},\mathpzc{A}_{2})\in\mathcal{M}_{2}$ we have $|\Psi^{-1}(\mathpzc{A}_{1},\mathpzc{A}_{2})|=2^{2d-2}$ . We get

[TABLE]

where $\mathpzc{a}=(\lambda,\mathbf{u}^{1},\ldots,\mathbf{u}^{d})$ and $\mathpzc{b}=(\mu,\mathbf{v}^{1},\ldots,\mathbf{v}^{d})$ are both tuples in $(0,\infty)\times\mathcal{P}$ . Next, we compute the Jacobian determinant $\mathrm{Jac}(\Phi\circ\Psi)(\mathpzc{a},\mathpzc{b})$ .

3.2. Computing the Jacobian determinant

Note that the dimension of the domain of $\Phi\circ\Psi$ is equal to $2\Sigma$ . As above, let $\mathpzc{a}=(\lambda,\mathbf{u}^{1},\ldots,\mathbf{u}^{d})$ and $\mathpzc{b}=(\mu,\mathbf{v}^{1},\ldots,\mathbf{v}^{d})$ be tuples in $(0,\infty)\times\mathcal{P}$ with $\mathcal{P}$ as in 3.1. In the following, we write

[TABLE]

The Jacobian determinant of $\Phi\circ\Psi$ at $(\mathpzc{a},\mathpzc{b})$ is, by definition, the absolute value of the determinant of the linear map

[TABLE]

Consider the matrix of partial derivatives of $\Phi\circ\Psi$ with respect to the standard orthonormal basis of $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ :

[TABLE]

where

[TABLE]

Then, the Jacobian determinant of $\Phi\circ\Psi$ at $(\mathpzc{a},\mathpzc{b})$ is

[TABLE]

The latter is the volume of the parallelepiped spanned by the columns of $Q$ . We fix notation in the next definition.

Definition 3.1.

Let $N{\,\geq\,}n$ be positive integers and $U\in\mathbb{R}^{N\times n}$ be a matrix with columns $\mathbf{u}_{1},\ldots,\mathbf{u}_{n}\in\mathbb{R}^{N}$ . We denote by $\mathrm{vol}(U)$ the volume of the parallelepiped spanned by the $\mathbf{u}_{i}$ :

[TABLE]

We can now rewrite 3.6 as

[TABLE]

The reason why we write the partial derivatives of $\Phi\circ\Psi$ with respect to the standard basis of $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ is that we get the following convenient description:

[TABLE]

For describing $L$ , let for each $1\leq k\leq d$ ,

[TABLE]

be matrices containing as columns an ordered orthonormal basis of $(\mathbf{u}^{k})^{\perp}=\mathrm{T}_{\mathbf{u}^{k}}{\,\mathbb{S}(\mathbb{R}^{n_{k}})}$ and $(\mathbf{v}^{k})^{\perp}=\mathrm{T}_{\mathbf{v}^{k}}{\,\mathbb{S}(\mathbb{R}^{n_{k}})}$ , respectively. Then, by linearity and the product rule of differentiation, we have that $L=\begin{bmatrix}\lambda L_{1}&\mu L_{2}\end{bmatrix}$ is the block matrix consisting of $2$ blocks of the form

[TABLE]

Both $L_{1}$ and $L_{2}$ have $\sum_{k=1}^{d}(n_{k}-1)=\Sigma-1$ columns. Note that $M$ depends only on the $\mathbf{u}^{k}$ ’s and $\mathbf{v}^{k}$ ’s, whereas $L$ also depends on the parameters $\lambda$ and $\mu$ ; we do not emphasize these dependencies in the notation.

Comparing with [20, equation (5.1)], we see that the matrix $L_{1}$ has as columns an orthonormal basis for the orthogonal complement of $\mathpzc{U}$ in $\mathrm{T}_{\mathpzc{U}}{\,\mathcal{S}}$ . Analogously, the columns of $L_{2}$ form an orthonormal basis for the orthogonal complement of $\mathpzc{V}$ in $\mathrm{T}_{\mathpzc{V}}{\,\mathcal{S}}$ . Consequently, for $\Psi(\mathpzc{a},\mathpzc{b})$ , Terracini’s matrix from 2.6 can be chosen as

[TABLE]

This entails that

[TABLE]

and so

[TABLE]

having used the notation from Definition 3.1 and the fact that singular values are invariant under orthogonal transformations such as permutations of columns.

3.3. Bounding the integral

We are now ready to conclude the proof of Theorem 1.5, by showing that the expected value of the condition number of tensor rank decomposition is bounded from below by infinity.

By 2.6, the condition number at $\mathpzc{A}=\mathpzc{A}_{1}+\mathpzc{A}_{2}=\Phi(\mathpzc{A}_{1},\mathpzc{A}_{2})\in\mathcal{N}_{2}$ is the inverse of the smallest singular value of the Terracini’s matrix $U$ from 3.9. Therefore, if we plug 3.9 and 3.10 into 3.3, then we get

[TABLE]

where ${q(U)=\frac{\mathrm{vol}(U)}{\varsigma_{\min}(U)}}$ is as in 2.2, and

[TABLE]

From 3.9 it is clear that $U$ is a function of $\mathpzc{u}$ and $\mathpzc{v}$ but is independent of $\lambda$ and $\mu$ . Therefore, if we integrate first over $\lambda$ and $\mu$ , then we can ignore the factor $q(U)$ . In Section A.1 we compute this integral; the result is stated here as the next lemma.

Lemma 3.2.

Let $(\mathbf{u}^{1},\ldots,\mathbf{u}^{d}),(\mathbf{v}^{1},\ldots,\mathbf{v}^{d})\in\mathcal{P}$ be fixed. Then,

[TABLE]

where $\mathpzc{U}=\mathbf{u}^{1}\otimes\cdots\otimes\mathbf{u}^{d}$ and $\mathpzc{V}=\mathbf{v}^{1}\otimes\cdots\otimes\mathbf{v}^{d}$ .

The foregoing integral can be bounded from below by exploiting the next lemma, which is proved in Section A.2.

Lemma 3.3.

Let $\mathbf{x},\mathbf{y}\in\mathbb{S}(\mathbb{R}^{p})$ be two unit-norm vectors and $s\geq 1$ . Then, there exists a constant $k=k(p,s)$ independent of $\mathbf{x},\mathbf{y}$ such that

[TABLE]

Combining the foregoing lemmata and plugging the result into 3.11, we obtain

[TABLE]

Next, we exploit the symmetry of the domain $\mathbb{S}(\mathbb{R}^{n_{1}})$ by flipping the sign of $\mathbf{v}^{1}$ and, hence, of $\mathpzc{V}=\mathbf{v}^{1}\otimes\cdots\otimes\mathbf{v}^{d}$ . This substitution transforms $U$ into $UD$ , where $D$ is a diagonal matrix with some pattern of $\pm 1$ on the diagonal. Since $D$ is orthogonal, $q(U)=q(UD)$ , so that

[TABLE]

Denote this last integral by $J$ , and then it remains to show that $J=\infty$ . Consider the open set

[TABLE]

Since $D(\epsilon)$ is open, we have

[TABLE]

We now need two lemmata. The first one is straightforward.

Lemma 3.4.

Let $\epsilon>0$ be sufficiently small. For all $(\mathpzc{u},\mathpzc{v})\in D(\epsilon)$ with $\mathpzc{u}=(\mathbf{u}^{1},\ldots,\mathbf{u}^{d})$ and $\mathpzc{v}=(\mathbf{v}^{1},\ldots,\mathbf{v}^{d})$ , we have

[TABLE]

where $\mathpzc{U}=\mathbf{u}^{1}\otimes\cdots\otimes\mathbf{u}^{d}$ and $\mathpzc{V}=\mathbf{v}^{1}\otimes\cdots\otimes\mathbf{v}^{d}$ .

Proof.

For proving the upper bound, apply the triangle inequality to the telescoping sum

[TABLE]

and exploit $\|\mathbf{u}^{k}-\mathbf{v}^{k}\|\leq\|\mathbf{u}^{1}-\mathbf{v}^{1}\|$ for all $k=1,\ldots,d$ . The lower bound follows from

[TABLE]

having used $0<\langle\mathbf{u}^{k},\mathbf{v}^{k}\rangle\leq 1$ for sufficiently small $\epsilon$ . ∎

The second one is the final piece of the puzzle. We prove it in Section A.3.

Lemma 3.5.

For sufficiently small $\epsilon>0$ , we have for all $(\mathpzc{u},\mathpzc{v})\in D(\epsilon)$ with $\mathpzc{u}=(\mathbf{u}^{1},\ldots,\mathbf{u}^{d})$ and $\mathpzc{v}=(\mathbf{v}^{1},\ldots,\mathbf{v}^{d})$ that

[TABLE]

where $U$ is the matrix that depends on $\mathpzc{u}$ and $\mathpzc{v}$ as in 3.9 and $q$ is as in 2.2.

Combining Lemmata 3.4 and 3.5 with 3.13 we find

[TABLE]

where $c>0$ is some constant. Note that the integrand in this equation only depends on $\mathbf{u}^{1}$ and $\mathbf{v}^{1}$ . By definition of $D(\epsilon)$ , for each $2\leq k\leq d$ , and if we fix $\mathbf{u}^{k}$ , the domain of integration of $\mathbf{v}^{k}$ contains the difference of two spherical caps of respective affine radii $\frac{9}{10}\|\mathbf{u}^{1}-\mathbf{v}^{1}\|$ and $\|\mathbf{u}^{1}-\mathbf{v}^{1}\|$ . From 2.8, the volume of this difference of caps is greater than a constant times $\|\mathbf{u}^{1}-\mathbf{v}^{1}\|^{n_{j}-1}$ . Therefore, if we keep $\mathbf{u}^{1},\mathbf{v}^{1}\in\mathbb{S}(\mathbb{R}^{n_{1}})$ constant and integrate over $\mathbf{u}^{k},\mathbf{v}^{k}\in\mathbb{S}(\mathbb{R}^{n_{k}})$ , $k=2,\ldots,d$ , then we get

[TABLE]

where $c^{\prime}>0$ is a constant. Recall that $\Sigma=1+\sum_{k=1}^{d}(n_{k}-1)$ , so that

[TABLE]

By rotational invariance, the inner integral does not depend on $\mathbf{u}^{1}$ and moreover for small $\epsilon$ projecting through the stereographic projection (which has a Jacobian bounded above and below by a positive constant close to its center) we conclude that, for some other constant $c^{\prime\prime}$ ,

[TABLE]

This proves $J=\infty$ , so that $\operatorname*{\mathbb{E}}\kappa(\mathpzc{A})=\infty$ for tensors of rank bounded by $2$ , constituting a proof of Theorem 1.5.

4. The average condition number: from rank 2 to higher ranks

Having established that the average condition number of tensor rank decomposition of rank $2$ tensors is infinite, we extend this result to higher ranks. That is, we will prove Theorem 1.6. As before, we abbreviate $\mathcal{S}:=\mathcal{S}_{n_{1},\ldots,n_{d}}$ , $\sigma_{r}:=\sigma_{r,n_{1},\ldots,n_{d}}$ , $\mathcal{N}_{r}:=\mathcal{N}_{r,n_{1},\ldots,n_{d}}$ , and $\mathcal{M}_{r}:=\mathcal{M}_{r,n_{1},\ldots,n_{d}}.$

We proceed with an observation that is of independent interest.

Lemma 4.1.

Let $\mathpzc{A}=\sum_{i=1}^{r}\mathpzc{A}_{i}$ and $\mathpzc{B}=\sum_{i=1}^{s}\mathpzc{B}_{i}$ be $n_{1}\times\cdots\times n_{d}$ tensors, where the $\mathpzc{A}_{i}$ and $\mathpzc{B}_{i}$ are rank- $1$ tensors. If $\mathpzc{A}+\mathpzc{B}\in\sigma_{r+s;n_{1},\ldots,n_{d}}$ is $(r+s)$ -identifiable, then we have

[TABLE]

Proof.

First we observe that $\mathpzc{A}$ is $r$ -identifiable, and $\mathpzc{B}$ is $s$ -identifiable. Indeed, if the tensor $\mathpzc{C}=\mathpzc{A}+\mathpzc{B}$ is $(r+s)$ -identifiable, then the unique set $C$ of cardinality $|C|\leq r$ consisting of rank- $1$ tensors summing to $\mathpzc{C}$ is $C=\{\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r},\mathpzc{B}_{1},\ldots,\mathpzc{B}_{s}\}$ . If $\mathpzc{A}$ had an alternative decomposition $\{\mathpzc{A}_{1}^{\prime},\ldots,\mathpzc{A}_{r^{\prime}}^{\prime}\}$ , potentially of a shorter length $r^{\prime}\leq r$ , then $\{\mathpzc{A}_{1}^{\prime},\ldots,\mathpzc{A}_{r^{\prime}}^{\prime},\mathpzc{B}_{1},\ldots,\mathpzc{B}_{s}\}$ would be an alternative decomposition of $\mathpzc{C}$ . Hence, $\{\mathpzc{A}_{1}^{\prime},\ldots,\mathpzc{A}_{r^{\prime}}^{\prime}\}$ needs to equal $\{\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r}\}$ , so that $\mathpzc{A}$ is $r$ -identifiable. By symmetry, the result for $\mathpzc{B}$ follows. For all $i$ , let $U_{i}$ be a matrix with orthonormal columns that span $\mathrm{T}_{\mathpzc{A}_{i}}\mathcal{S}_{n_{1},\ldots,n_{d}}$ , and $V_{i}$ be a matrix with orthonormal columns that span $\mathrm{T}_{\mathpzc{B}_{i}}\mathcal{S}_{n_{1},\ldots,n_{d}}$ . Consider the matrices $U=[U_{1},\ldots,U_{r}]$ and $V=[V_{1},\ldots,V_{s}]$ . By 2.6 we have

[TABLE]

The claim follows from standard interlacing properties of singular values; see [51, Chapter 3]. ∎

The next simple lemma is immediate.

Lemma 4.2.

Consider the map $\phi:\sigma_{2}\times\mathcal{S}^{\times(r-2)}\to\sigma_{r},\,(\mathpzc{B},\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r-2})\mapsto\mathpzc{B}+\sum_{i=1}^{r-2}\mathpzc{A}_{i}.$ The following holds.

(1)

For $r>2$ , we have $\phi(\sigma_{2}\times\mathcal{S}^{\times(r-2)})=\sigma_{r}$ . 2. (2)

Let $\mathpzc{A}\in\sigma_{r}$ be $r$ -identifiable. Then, $|\phi^{-1}(\mathpzc{A})|=(r-2)!\cdot\binom{r}{2}$ .

Finally, the next lemma is the key to Theorem 1.6, providing a lower bound for the Jacobian determinant of $\phi$ in a special open subset of $\sigma_{2}\times\mathcal{S}^{\times(r-2)}$ . We postpone its proof to Appendix B.

Lemma 4.3.

On top of Assumption 1 we assume that $\sigma_{r-2;n_{1}-2,\ldots,n_{d}-2}$ is generically complex identifiable. Then, there are constants $\mu,\epsilon,\nu_{1},\ldots,\nu_{r-2}>0$ depending only on $r,n_{1},\ldots,n_{d}$ with the following property: For all $\mathpzc{B}\in\cal{N}_{2}$ there exists a tuple $(\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r-2})\in\mathcal{S}^{\times(r-2)}$ with $\|\mathpzc{A}_{i}\|=\nu_{i}$ and

[TABLE]

where $\phi$ is as in Lemma 4.2.

*Remark 4.4**.*

Given any $\mathpzc{B}\in\sigma_{2}$ , by taking a sequence $\mathpzc{B}^{(i)}\subseteq\cal{N}_{2}$ converging to $\mathpzc{B}$ one can generate the corresponding sequences $\mathpzc{A}_{1}^{(i)},\dots,\mathpzc{A}_{r-2}^{(i)}\in\mathcal{S}$ from Lemma 4.3. Now, by compactness we can find an accumulation point $\mathpzc{A}_{1},\dots,\mathpzc{A}_{r-2}\in\mathcal{S}$ . Since $\mathrm{Jac}(\phi)$ is continuous and hence uniformly continuous when restricted to a compact set, by choosing small enough $\epsilon$ we can assure that for all $\mathpzc{B}^{\prime}$ , $\|\mathpzc{B}-\mathpzc{B^{\prime}}\|\leq\epsilon$ and for all $\mathpzc{A}_{i}^{\prime}$ , $\|\mathpzc{A}_{i}-\mathpzc{A}_{i}^{\prime}\|\leq\epsilon$ , we have $\mathrm{Jac}\,(\phi)(\mathpzc{B}^{\prime},\mathpzc{A}_{1}^{\prime},\ldots,\mathpzc{A}_{r-2}^{\prime})>\frac{\mu}{2}$ , where $\epsilon$ and $\mu$ do not depend on $\mathpzc{B}$ .

Now we prove Theorem 1.6.

Proof of Theorem 1.6.

Recall the surjective map $\phi:\sigma_{2}\times{\mathcal{S}^{\times(r-2)}}\to\sigma_{r}$ from Lemma 4.2. From Theorem 1.5 and the fact that $\kappa(\mathpzc{B})=\kappa(t\mathpzc{B})$ for $t>0$ , there exists a tensor $\mathpzc{B}\in\sigma_{2}$ such that for every $\delta>0$ we have

[TABLE]

From Lemmata 4.3 and 4.4, there exist tensors $\mathpzc{A}_{1},\ldots,\mathpzc{A}_{r-2}\in\mathcal{S}$ such that

[TABLE]

for all $\mathpzc{B}^{\prime},\mathpzc{A}_{1}^{\prime},\ldots,\mathpzc{A}_{r-2}^{\prime}$ such that $\|\mathpzc{B}^{\prime}-\mathpzc{B}\|<\epsilon,\|\mathpzc{A}^{\prime}_{i}-\mathpzc{A}_{i}\|<\epsilon$ , and $\mathpzc{B}^{\prime}\in\mathcal{N}_{2}$ . Let $\mathcal{U}\subseteq\mathcal{N}_{2}\times\mathcal{S}^{\times r-2}$ be the set of all $\mathpzc{B}^{\prime},\mathpzc{A}_{1}^{\prime},\ldots,\mathpzc{A}_{r-2}^{\prime}$ satisfying the foregoing conditions. From Lemma 4.1, we have

[TABLE]

Moreover, by Lemma 4.3 and the inverse function theorem, by taking small enough $\epsilon$ and $\delta$ we can assume that $\phi|_{\mathcal{U}}$ is a diffeomorphism onto its image444This is different from $\phi|_{\phi^{-1}(\phi(\mathcal{U}))}$ being a diffeomorphism. Indeed, that mapping is in general finite-to-one. and hence $\phi(\mathcal{U})$ is open. The coarea formula 2.9 thus applies yielding

[TABLE]

The theorem follows since $\phi(\mathcal{U})\subseteq\sigma_{r}$ . ∎

5. The angular condition number of tensor rank decomposition

In this section we prove Theorem 1.9. As in the previous section, to ease notation, we abbreviate $\mathcal{M}_{2}:=\mathcal{M}_{2;n_{1},\ldots,n_{d}}$ , $\mathcal{N}_{2}:=\mathcal{N}_{2;n_{1},\ldots,n_{d}}$ , $\mathcal{S}_{2}:=\mathcal{S}_{2;n_{1},\ldots,n_{d}}$ , and $\sigma_{2}:=\sigma_{2;n_{1},\ldots,n_{d}}$ .

5.1. A characterization of the angular condition number as a singular value

We first derive a formula for the angular condition number in terms of singular values, similar to the one from 2.6. Recall from 1.6 that the angular condition number for rank $r=2$ is

[TABLE]

where $p:\mathbb{R}^{n_{1}\times\cdots\times n_{d}}\to\mathbb{S}(\mathbb{R}^{n_{1}\times\cdots\times n_{d}})$ is the canonical projection onto the sphere and where $\Phi^{-1}{a}$ is a local inverse of $\Phi:\mathcal{S}\times\mathcal{S}\to\sigma_{2}$ at $\mathpzc{a}\in\mathcal{S}^{\times 2}$ with $\mathpzc{A}=\Phi(\mathpzc{a})$ . As before, the value of $\kappa_{\mathrm{ang}}$ on $\sigma_{2}\setminus\mathcal{N}_{2}$ is not relevant for our analysis, so we do not specify it.

Proposition 5.1.

Under Assumption 1, let $\mathpzc{A}=\lambda\,\mathbf{u}^{1}\otimes\cdots\otimes\mathbf{u}^{d}+\mu\,\mathbf{v}^{1}\otimes\cdots\otimes\mathbf{v}^{d}\in\mathcal{N}_{2}$ , where for $1\leq k\leq d$ we have $\mathbf{u}^{k},\mathbf{v}^{k}\in\mathbb{S}(\mathbb{R}^{n_{k}})$ . Recall from 3.5 the definitions of the matrices $M$ and $L$ , associated to $\mathpzc{A}$ . The following equality holds:

[TABLE]

as far as the right–hand term is finite.

Proof.

By Proposition 2.2, any local inverse $\Phi^{-1}{a}$ is differentiable at $\mathpzc{A}=\Phi(\mathpzc{a})\in\mathcal{N}_{2}$ . The projection $p$ is also differentiable, so that

[TABLE]

where $\|\cdot\|_{2}$ is the spectral norm from 2.1. We compute this norm.

Let $\dot{\mathpzc{A}}\in\mathrm{T}_{\mathpzc{A}}{\,\mathcal{N}_{2}}$ and $(\dot{\mathpzc{A}}_{1},\dot{\mathpzc{A}}_{2})=\mathrm{d}{}_{\mathpzc{A}}\Phi^{-1}{a}(\dot{\mathpzc{A}})$ . Then, by linearity of the derivative, we have $\dot{\mathpzc{A}}=\dot{\mathpzc{A}}_{1}+\dot{\mathpzc{A}}_{2}$ . Furthermore, for $i=1,2$ , the derivative $\mathrm{d}{}_{\mathpzc{A}_{i}}p$ is the orthogonal projection onto the orthogonal complement of $\mathpzc{A}_{i}$ in $\mathbb{R}^{\Pi}$ . According to this we decompose $\dot{\mathpzc{A}}_{1}$ and $\dot{\mathpzc{A}}_{2}$ as

[TABLE]

Then, we have $\mathrm{d}{}_{\mathpzc{A}}(p^{\times 2}\circ\Phi^{-1}{a})(\dot{\mathpzc{A}})=(\dot{\mathpzc{A}}_{1}^{\perp}/\|\mathpzc{A}_{1}\|,\dot{\mathpzc{A}}_{2}^{\perp}/\|\mathpzc{A}_{2}\|)$ and, consequently,

[TABLE]

Recall from 3.5 the matrices $L=\begin{bmatrix}\lambda L_{1}&\mu L_{2}\end{bmatrix}$ and $M=\begin{bmatrix}\mathpzc{U}&\mathpzc{V}\end{bmatrix}$ . We can find vectors $\mathbf{x}_{1},\mathbf{x}_{2}\in\mathbb{R}^{\Sigma-1}$ with $\dot{\mathpzc{A}}_{1}^{\perp}=\lambda L_{1}\mathbf{x}_{1}$ and $\dot{\mathpzc{A}}_{2}^{\perp}=\mu L_{2}\mathbf{x}_{2}$ , and such that $\|\dot{\mathpzc{A}}_{1}^{\perp}\|=\lambda\|\mathbf{x}_{1}\|$ and $\|\dot{\mathpzc{A}}_{2}^{\perp}\|=\mu\|\mathbf{x}_{2}\|$ . Observe that $\lambda=\|\mathpzc{A}_{1}\|$ and $\mu=\|\mathpzc{A}_{2}\|$ . This yields

[TABLE]

Writing

[TABLE]

Since we are assuming that $(\mathrm{I}-MM^{\dagger})L$ is injective (for $\varsigma_{\min}((\mathrm{I}-MM^{\dagger})L)\neq 0$ ), it has a left inverse and we can write

[TABLE]

Combining 5.2 and 5.3 we see that

[TABLE]

the second equality from $\left(PL\right)^{\dagger}P=\left(PL\right)^{\dagger}$ , which is a basic property of the Moore–Penrose pseudoinverse holding for any orthogonal projector $P$ . This finishes the proof. ∎

5.2. Proof of Theorem 1.9

Now comes the actual proof of Theorem 1.9. Proceeding in exactly the same way as in Section 3.1 and using Proposition 5.1, we get

[TABLE]

where $C_{2}=C_{2;n_{1},\ldots,n_{d}}$ is as in Definition 1.3, $\mathcal{P}$ is as in 3.1, $Q=\begin{bmatrix}L&M\end{bmatrix}$ is as in 3.4, the volume $\mathrm{vol}$ is as in Definition 3.1, and

[TABLE]

is as in 3.12, so that $\mathpzc{A}=\lambda\mathpzc{U}+\mu\mathpzc{V}$ . Next, we relate $\mathrm{vol}(Q)$ to the volume of $(\mathrm{I}-MM^{\dagger})L$ .

Lemma 5.2.

We have $\mathrm{vol}(Q)=\mathrm{vol}(M)\,\mathrm{vol}((\mathrm{I}-MM^{\dagger})L).$

Proof.

Let $Q^{\perp}$ be a matrix whose columns contain an orthonormal basis for the orthogonal complement of the column span of $Q$ . Then, from the definition,

[TABLE]

where in the last step we just multiplied by a matrix whose determinant is $1$ . Performing the inner multiplication we then get

[TABLE]

These two blocks are mutually orthogonal, since $(\mathrm{I}-MM^{\dagger})$ is the projection on the orthogonal complement of the span of $M$ , and hence the volume is the product of the volumes corresponding to each block. The assertion follows. ∎

We use Lemma 5.2 to rewrite 5.4 as

[TABLE]

where $q$ is as in 2.2. Recall from 3.7 that $M$ is independent of $\lambda$ and $\mu$ . We first compute the integral over $\lambda,\mu$ using the next lemma. We prove the lemma in Section C.1.

Lemma 5.3.

Let $L_{1},L_{2}$ be the matrices defined as in 3.8, such that $L=\begin{bmatrix}\lambda L_{1}&\mu L_{2}\end{bmatrix}$ . Let

[TABLE]

Then,

[TABLE]

Inserting the results from this lemma into 5.5, we get

[TABLE]

where

[TABLE]

In the remaining part of this section we show that $J_{\mathrm{outer}}$ is bounded by a constant, which would conclude the proof. We do this by giving a sequence of upper bounds. We have no hope of providing sharp bounds, so rather than keeping track of all the constants, we will exploit the following definition for streamlining the proof.

Definition 5.4.

For $A,B\in[0,\infty]$ we will write $A\preceq B$ if $B\in\mathbb{R}$ implies $A\in\mathbb{R}$ . That is, $A\preceq B$ is an equivalent statement to “ $B<\infty\Rightarrow A<\infty$ ”.

First, note that $\mathrm{vol}(M)=\sqrt{1-\langle\mathpzc{U},\mathpzc{V}\rangle^{2}}$ , so that

[TABLE]

Next, we exploit the symmetry of $\mathbb{S}(\mathbb{R}^{n_{1}})$ and transform $\mathbf{v}^{1}\mapsto-\mathbf{v}^{1}$ . This transformation flips the sign of $\mathpzc{V}$ , but the value of $q$ is not affected. Indeed, the matrix $\mathrm{I}-MM^{\dagger}$ still projects onto $\mathrm{span}(\mathpzc{U},\mathpzc{V})^{\perp}=\mathrm{span}(\mathpzc{U},-\mathpzc{V})^{\perp}$ , and $L_{2}$ is transformed into $L_{2}D$ , where $D$ is a diagonal matrix with some pattern of $\pm 1$ on the diagonal. Since $\left[\begin{smallmatrix}I&\\ &D\end{smallmatrix}\right]$ is an orthogonal transformation, the singular values do not change. Thus, we obtain

[TABLE]

The next lemma is proved in Section C.2.

Lemma 5.5.

Let $\theta\in[0,\tfrac{\pi}{2}]$ and fix $\theta,\mathpzc{u}$ and $\mathpzc{v}$ . There is a constant $K>0$ , depending only on $n_{1},\ldots,n_{d}$ and $d$ , such that

[TABLE]

The lemma implies

[TABLE]

For bounding the integral over $\theta$ we need the next lemma, which we prove in Section C.3.

Lemma 5.6.

Let $a>1,p\geq 1$ . There exists a constant $K>0$ , depending only on $a$ , such that for any unit vectors $\mathbf{x},\mathbf{y}\in\mathbb{S}{}(\mathbb{R}^{p})$ , $\mathbf{x}\neq\mathbf{y}$ , we have

[TABLE]

Applying this lemma to 5.7, we obtain

[TABLE]

Writing $\|\mathpzc{U}-\mathpzc{V}\|=\sqrt{2}\sqrt{1-\langle\mathpzc{U},\mathpzc{V}\rangle}$ , we arrive at

[TABLE]

By orthogonal invariance, we may fix $\mathbf{u}^{k}\in\mathbb{S}(\mathbb{R}^{n_{k}})$ to be $\mathbf{u}^{k}=(1,0,\ldots,0)$ , and integrate the constant function $1$ over one copy of $\mathbb{S}{}(\mathbb{R}^{n_{1}})\times\cdots\times\mathbb{S}{}(\mathbb{R}^{n_{d}})$ . Ignoring the product of volumes $\prod_{k=1}^{d}\mathrm{vol}(\mathbb{S}{}(\mathbb{R}^{n_{k}}))$ we have

[TABLE]

Now, this spherical integral is particularly simple because the integrand depends uniquely on one of the components of each vector. One can thus transform each integral in a sphere into an integral in an interval (see for example [10, Lemma 1]) getting:

[TABLE]

For this last integral we consider the partition of the cube $[-1,1]^{d}$ into $2^{d}$ pieces corresponding to the different signs of the coordinates. In the pieces where the number of negative coordinates is odd, the denominator of the integrand is bounded below by $1$ and thus the whole integrand is also bounded above by $1$ . Hence it suffices to check that the integral in the rest of the pieces is bounded. Assume now that $t_{i_{1}},\ldots,t_{i_{k}}$ with $k\geq 2$ even are the negative coordinates in some particular piece of the partition. The mapping that leaves all coordinates fixed but maps $t_{i_{k-1}}\mapsto-t_{i_{k-1}}$ and $t_{i_{k}}\mapsto-t_{i_{k}}$ preserves the integrand and moves the domain to another piece of the partition with $k-2$ negative coordinates. This process can then be repeated until none of the coordinates is negative. All in one, we have

[TABLE]

The change of variables $t_{k}=\cos(\theta_{k})$ for $1\leq k\leq d$ converts this last integral into

[TABLE]

The next lemma is proved in Section C.4.

Lemma 5.7.

Let $d\geq 1$ and $\theta_{1},\ldots,\theta_{d}\in[0,\tfrac{\pi}{2}]$ . Then, $\cos(\theta_{1})\cdots\cos(\theta_{d})\leq 1-\frac{\theta_{1}^{2}+\cdots+\theta_{d}^{2}}{7d}.$

Using the lemma and the inequality $\sin(\theta)<\theta$ on $0\leq\theta\leq\tfrac{\pi}{2}$ , we find that the integral in 5.8 is bounded by a constant times the following integral:

[TABLE]

Changing the name of the variables to $x_{1},\ldots,x_{d}$ and integrating over the $d$ -dimensional ball of radius $\tfrac{\pi}{2}\sqrt{d}$ , which contains the domain $[0,\tfrac{\pi}{2}]^{d}$ , we get a new upper bound for the last integral, which implies

[TABLE]

Recall that $\Sigma=1+\sum_{j=1}^{d}(n_{j}-1)$ . By passing to polar coordinates we get

[TABLE]

This shows $J_{\mathrm{outer}}<\infty$ implying $\operatorname*{\mathbb{E}}\kappa_{\mathrm{ang}}(\mathpzc{A})<\infty$ , finishing the proof of Theorem 1.9. ∎

6. Other random tensors: proof of Theorem 1.11

We demonstrate how our main results can be extended to many other distributions as well.

Consider the first item of Theorem 1.11. We assume that $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}$ has the density $\hat{\rho}$ and that there exists positive constants $c_{1},c_{2}$ such that $c_{1}\leq\frac{\hat{\rho}}{\rho}\leq c_{2}$ , where $\rho$ is the density of a GIT. Then, for any measurable function $f(\mathpzc{A})$ we have

[TABLE]

and

[TABLE]

Thus, $\operatorname*{\mathbb{E}}_{\mathpzc{A}\sim\hat{\rho}}\,\ f(\mathpzc{A})=\infty$ if and only if $\operatorname*{\mathbb{E}}_{\mathpzc{A}\sim\rho}\,\ f(\mathpzc{A})=\infty$ . Replacing $f$ by $\kappa$ and $\kappa_{\mathrm{ang}}$ proves the first part of Theorem 1.11.

By [20, Proposition 4.4] $\kappa$ is invariant under multiplication of $\mathpzc{A}$ by a scalar. Therefore, the expected value of $\kappa$ for the Gaussian is equal to the expected value when $\mathpzc{A}$ is chosen uniformly in the unit ball, and also when $\mathpzc{A}$ is chosen uniformly in the unit sphere of the space of tensors. Namely, we have (see, e.g., [29, Section 2.2.4])

[TABLE]

This proves the second and third item of Theorem 1.11 for $\kappa$ .

For $\kappa_{\mathrm{ang}}$ we need the following lemma.

Lemma 6.1.

If $\mathpzc{A}\in\sigma_{r;n_{1},\ldots,n_{d}}$ is an $r$ -nice tensor, then $\kappa_{\mathrm{ang}}(t\mathpzc{A})=\kappa_{\mathrm{ang}}(\mathpzc{A})/t$ for all $t>0$ .

Proof.

Since $\mathpzc{A}$ is $r$ -nice, we have $\kappa_{\mathrm{ang}}(\mathpzc{A})=\|\mathrm{d}{}_{\mathpzc{A}}(p^{\times r}\circ\Phi^{-1}{a})\|$ . Similar as for 5.1 we can show $\|\mathrm{d}{}_{\mathpzc{A}}(p^{\times r}\circ\Phi^{-1}{a})(\dot{\mathpzc{A}})\|=\sqrt{\sum_{i=1}^{r}\,\|\mathpzc{A}_{i}\|^{-2}\,\|\mathrm{d}_{\mathpzc{A}_{i}}p\,\dot{\mathpzc{A}}_{i}\|^{2}},$ where $\mathpzc{A}=\mathpzc{A}_{1}+\cdots+\mathpzc{A}_{r}$ is the CPD of $\mathpzc{A}$ and $\dot{\mathpzc{A}}=\dot{\mathpzc{A}}_{1}+\cdots+\dot{\mathpzc{A}}_{r}$ is the corresponding decomposition the tangent vector. The derivative $\mathrm{d}_{\mathpzc{A}_{i}}p$ is the orthogonal projection onto $\mathpzc{A}_{i}^{\perp}$ and independent of scaling. Moreover, $\sigma_{r;n_{1},\ldots,n_{d}}$ is a cone and so $\mathrm{T}_{\mathpzc{A}}{\,\sigma}_{r;n_{1},\ldots,n_{d}}$ can be identified with $\mathrm{T}_{t\mathpzc{A}}{\,\sigma}_{r;n_{1},\ldots,n_{d}}$ . This shows that after scaling the tensor $\mathpzc{A}$ we get $\|\mathrm{d}{}_{t\mathpzc{A}}(p^{\times r}\circ\Phi^{-1}{a})(\dot{\mathpzc{A}})\|=t^{-1}\|\mathrm{d}{}_{\mathpzc{A}}(p^{\times r}\circ\Phi^{-1}{a})(\dot{\mathpzc{A}})\|$ and hence $\kappa_{\mathrm{ang}}(t\mathpzc{A})=\kappa_{\mathrm{ang}}(\mathpzc{A})/t$ . ∎

Now, we can prove the rest of Theorem 1.11. Recall from Definition 1.3 that the density of a GIT on $\sigma_{r;n_{1},\ldots,n_{d}}$ is $\rho(\mathpzc{A}):=(C_{r;n_{1},\ldots,n_{d}})^{-1}\,e^{-\frac{\|\mathpzc{A}\|^{2}}{2}}$ , where $C_{r;n_{1},\ldots,n_{d}}=\int_{\sigma_{r;n_{1},\ldots,n_{d}}}e^{-\frac{\|\mathpzc{A}\|^{2}}{2}}\,\mathrm{d}{}\mathpzc{A}$ . Since our results for $\kappa_{\mathrm{ang}}$ are for rank- $2$ tensors, we put $r=2$ in the following. We also abbreviate $\sigma_{2}:=\sigma_{2;n_{1},\ldots,n_{d}}$ and $C_{2}:=C_{2;n_{1},\ldots,n_{d}}$ . Then, using Lemma 6.1 we can integrate in polar coordinates to obtain

[TABLE]

It follows immediately that the last integral is finite, proving that a randomly chosen $\mathpzc{A}\in\mathbb{S}(\sigma_{2})$ has finite expected $\kappa_{\mathrm{ang}}$ . Finally, if $\mathpzc{A}$ is chosen randomly in the unit ball in $\sigma_{2}$ , the same argument shows that the expected value is again finite:

[TABLE]

This finishes the proof of Theorem 1.11.

7. Numerical experiments

Having proved that the expected value of the condition number is infinite in most cases, we provide further computational evidence in support of 1.8. To this end, a natural idea is to perform Monte Carlo experiments in a few of the unknown cases as in [23].

Sampling GITs is hard in practice, as the defining polynomial equalities and inequalities of the semialgebraic set $\sigma_{r}=\sigma_{r;n_{1},\ldots,n_{d}}$ of tensors of rank bounded by $r$ are not known in the literature.555See [57, Chapter 7] and the references therein for some results on equations of the algebraic closure of $\sigma_{r}$ . Nevertheless, there are a few cases that we can treat numerically. If $r=\frac{\Pi}{\Sigma}$ and the algebraic closure $\overline{\sigma_{r}}(\mathbb{C})$ has $\dim\overline{\sigma_{r}}(\mathbb{C})=\Pi$ , a so-called perfect tensor space, then $\sigma_{r}$ is an open subset of the ambient $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ ; see, e.g., [57, 15].

From Remark 1.4, we can sample from the density $\rho$ on $\sigma_{r}$ via an acceptance–rejection method: Randomly sample tensors $\mathpzc{A}$ from the density $e^{-\frac{\|\mathpzc{A}\|^{2}}{2}}$ on $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ until we find one that belongs to $\sigma_{r}$ . While this scheme will yield tensors distributed according to the density $\rho$ on $\sigma_{r}$ , it does not yield Gaussian identifiable tensors in general. The reason is that most perfect tensor spaces are not (expected to be) generically $r$ -identifiable [47]. Fortunately, there are a few known exceptions: matrix pencils ( $\mathbb{R}^{n\times n\times 2}$ for all $n\geq 2$ ), $\mathbb{R}^{5\times 4\times 3}$ and $\mathbb{R}^{3\times 2\times 2\times 2}$ are proved to be generically complex $r$ -identifiable for $r=\frac{\Pi}{\Sigma}$ . By applying the acceptance–rejection method to these spaces, every sampled tensor is a GIT with probability $1$ .

For numerically checking if a random tensor $\mathpzc{A}\in\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ in a perfect tensor space lies in $\sigma_{r}$ with $r=\frac{\Pi}{\Sigma}$ , we apply a homotopy continuation method to the square system of $\Pi$ equations

[TABLE]

where the $\Pi=r\Sigma$ entries of the $\mathbf{a}_{i}^{k}$ ’s are treated as variables, and the $n_{1}\times\cdots\times n_{d}$ tensor $\mathpzc{A}$ is the tensor to decompose. We generate a start system with one solution to track by randomly sampling the entries of the vectors $\mathbf{a}_{i}^{k}$ i.i.d. from a real standard Gaussian distribution and then constructing the corresponding tensor $\mathpzc{A}_{0}$ . Since $r=\frac{\Pi}{\Sigma}$ is the so-called generic rank of tensors in perfect tensor spaces $\mathbb{C}^{n_{1}\times\cdots\times n_{d}}$ , the above system has at least one complex solution with probability $1$ as well. If we consider complex $r$ -identifiable perfect tensor spaces at the generic rank, we can thus determine if $\mathpzc{A}\in\sigma_{r}$ by solving the square system and checking whether the unique solution is real. Assuming that we use a certified homotopy method such as alphaCertified [48], this approach will correctly classify $\mathpzc{A}$ with probability $1$ , thus not impacting the overall distribution produced by the acceptance–rejection scheme.

We implemented the above scheme in Julia 1.0.3 using version 0.4.3 of the package HomotopyContinuation.jl [19], employing the solve function with default parameter settings. We deem a solution real if the norm of the imaginary part is less than $10^{-8}$ . Note that this package does not offer certified tracking; however, the failure rate observed in our experiments was very low, namely $0.0512498\%$ —see Section 7. For this reason, we are convinced that the distribution produced by the acceptance–rejection scheme is very close to the true distribution.

We performed the following experiment for estimating the distribution of the condition numbers of GITs of generically complex $r$ -identifiable tensors in perfect tensor spaces with $r=\frac{\Pi}{\Sigma}$ , the complex generic rank. As explained above, we randomly sampled an element $\mathpzc{A}$ of $\mathbb{R}^{n_{1}\times\cdots\times n_{d}}$ from the density $e^{-\frac{\|\mathpzc{A}\|^{2}}{2}}$ by choosing its entries i.i.d. standard normally distributed. Then, we generated one random starting starting system and applied the solve function from HomotopyContinuation.jl for tracking the starting solution $\mathpzc{A}_{0}$ to the target $\mathpzc{A}$ . If the final solution of the square system was real, we recorded both the regular and angular condition numbers at the CPD of $\mathpzc{A}$ computed via homotopy continuation. These computations were performed in parallel using $20$ computational threads until $100,000$ finite, nonsingular, real solutions and corresponding condition numbers were obtained. This experiment was performed on a computer system consisting of $2$ Intel Xeon E5-2697 v3 CPUs with $12$ cores clocked at 2.6GHz and 128GB main memory. Information about the sampling process via the acceptance–rejection method are summarized in Section 7, and Figure 7.1 visualizes the complementary cumulative distribution functions of the regular and angular condition numbers.

Bibliography70

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. S. Allman, C. Matias, and J. A. Rhodes, Identifiability of parameters in latent structure models with many observed variables , Ann. Statist. 37 (2009), no. 6A, 3099–3132.
2[2] D. Amelunxen and P. Bürgisser, Probabilistic analysis of the Grassmann condition number , Found. Comput. Math. 15 (2015), no. 1, 3–51.
3[3] D. Amelunxen and M. Lotz, Average-case complexity without the black swans , J. Complexity 41 (2017), 82–101.
4[4] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and M. Telgarsky, Tensor decompositions for learning latent variable models , J. Mach. Learn. Res. 15 (2014), 2773–2832.
5[5] E. Angelini, C. Bocci, and L. Chiantini, Real identifiability vs. complex identifiability , Linear Multilinear Algebra 66 (2017), 1257–1267.
6[6] D. Armentano and C. Beltrán, The polynomial eigenvalue problem is well conditioned for random inputs , SIAM J. Matrix Anal. Appl. 40 (2019), no. 1, 175–193.
7[7] D. Armentano and F. Cucker, A randomized homotopy for the Hermitian eigenpair problem , Found. Comput. Math. 15 (2015), no. 1, 281–312.
8[8] C. Beltrán, P. Breiding, and N. Vannieuwenhoven, Pencil-based algorithms for tensor rank decomposition are not stable , SIAM J. Matrix Anal. Appl. 40 (2019), no. 2, 739–773.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

The average condition number of most tensor rank decomposition problems is infinite

Abstract.

1. Introduction

1.1. The condition number of tensor rank decomposition

1.2. Informal version of our main results and discussion.

1.3. Identifiable tensors and a formula for the condition number

Lemma 1.1** (Kruskal’s criterion [55, 65]).**

Assumption 1**.**

Remark 1.2*.*

1.4. Main results

Definition 1.3** (Gaussian Identifiable Tensors).**

Remark 1.4*.*

Theorem 1.5**.**

Theorem 1.6**.**

Corollary 1.7**.**

Conjecture 1.8**.**

Theorem 1.9**.**

Conjecture 1.10**.**

Theorem 1.11**.**

1.5. Acknowledgements

1.6. Organization of the article

2. Notation and Preliminaries

2.1. Notation

2.2. Differential geometry

2.3. The manifold of rrr-nice tensors

Definition 2.1**.**

Proposition 2.2**.**

Proof.

Remark 2.3*.*

2.4. Sensitivity of CPDs

2.5. Integrals

Lemma 2.4**.**

Proof.

2.6. The coarea formula

Remark 2.5*.*

3. The average condition number of Gaussian tensors of rank two

3.1. Parameterizing 222-nice tensors

3.2. Computing the Jacobian determinant

Definition 3.1**.**

3.3. Bounding the integral

Lemma 3.2**.**

Lemma 3.3**.**

Lemma 3.4**.**

Proof.

Lemma 3.5**.**

4. The average condition number: from rank 2 to higher ranks

Lemma 4.1**.**

Proof.

Lemma 4.2**.**

Lemma 4.3**.**

Remark 4.4*.*

Proof of Theorem 1.6.

5. The angular condition number of tensor rank decomposition

5.1. A characterization of the angular condition number as a singular value

Proposition 5.1**.**

Proof.

5.2. Proof of Theorem 1.9

Lemma 5.2**.**

Proof.

Lemma 5.3**.**

Definition 5.4**.**

Lemma 5.5**.**

Lemma 5.6**.**

Lemma 5.7**.**

6. Other random tensors: proof of Theorem 1.11

Lemma 6.1**.**

Proof.

7. Numerical experiments

Lemma 1.1 (Kruskal’s criterion [55, 65]).

Assumption 1.

*Remark 1.2**.*

Definition 1.3 (Gaussian Identifiable Tensors).

*Remark 1.4**.*

Theorem 1.5.

Theorem 1.6.

Corollary 1.7.

Conjecture 1.8.

Theorem 1.9.

Conjecture 1.10.

Theorem 1.11.

2.3. The manifold of $r$ -nice tensors

Definition 2.1.

Proposition 2.2.

*Remark 2.3**.*

Lemma 2.4.

*Remark 2.5**.*

3.1. Parameterizing $2$ -nice tensors

Definition 3.1.

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

Lemma 3.5.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

*Remark 4.4**.*

Proposition 5.1.

Lemma 5.2.

Lemma 5.3.

Definition 5.4.

Lemma 5.5.

Lemma 5.6.

Lemma 5.7.

Lemma 6.1.