Fast approximation of the $p$-radius, matrix pressure or generalised   Lyapunov exponent for positive and dominated matrices

Ian D. Morris

arXiv:1905.00749·math.DS·May 3, 2019

Fast approximation of the $p$-radius, matrix pressure or generalised Lyapunov exponent for positive and dominated matrices

Ian D. Morris

PDF

Open Access

TL;DR

This paper introduces a new algorithm for efficiently approximating the p-radius, matrix pressure, or generalized Lyapunov exponent for positive or dominated matrices, with significant improvements for low-dimensional cases.

Contribution

The authors develop a novel eigenvalue-based algorithm using Fredholm determinants to compute the p-radius for positive and dominated matrices, enhancing accuracy and efficiency.

Findings

01

Significant improvements over existing methods for low-dimensional positive matrix pairs.

02

The algorithm interprets the p-radius as a leading eigenvalue of a trace-class operator.

03

Applicable to matrices with positivity or domination properties, relevant in wavelet and fractal analysis.

Abstract

If A_1,...,A_N are real square matrices then the p-radius, generalised Lyapunov exponent or matrix pressure is defined to be the asymptotic exponential growth rate of the sum $\sum_{i_{1}, \dots, i_{n} = 1}^{N} ∥ A_{i_{n}} \dots A_{i_{1}} ∥^{p}$ , where p is a real parameter. Under its various names this quantity has been investigated for its applications to topics including wavelet regularity and refinement equations, fractal geometry and the large deviations theory of random matrix products. In this article we present a new algorithm for computing the p-radius under the hypothesis that the matrices are all positive, or more generally under the hypothesis that they satisfy a weaker condition called domination. This algorithm is based on interpreting the p-radius as the leading eigenvalue of a trace-class operator on a Hilbert space and estimating that eigenvalue via approximations to the Fredholm…

Equations134

ϱ_{p} (A_{1}, \dots, A_{N}) := n \to \infty lim (i_{1}, \dots, i_{n} = 1 \sum N ∥ A_{i_{n}} \dots A_{i_{1}} ∥^{p})^{\frac{1}{n}}

ϱ_{p} (A_{1}, \dots, A_{N}) := n \to \infty lim (i_{1}, \dots, i_{n} = 1 \sum N ∥ A_{i_{n}} \dots A_{i_{1}} ∥^{p})^{\frac{1}{n}}

a_{n} (p) := lo g (i_{1}, \dots, i_{n} = 1 \sum N ∥ A_{i_{n}} \dots A_{i_{1}} ∥^{p})

a_{n} (p) := lo g (i_{1}, \dots, i_{n} = 1 \sum N ∥ A_{i_{n}} \dots A_{i_{1}} ∥^{p})

n \to \infty lim sup \frac{1}{n} lo g max {\frac{σ _{2} ( A _{i_{n}} \dots A _{i_{1}} )}{σ _{1} ( A _{i_{n}} \dots A _{i_{1}} )} : 1 \leq i_{1}, \dots, i_{n} \leq N} < 0

n \to \infty lim sup \frac{1}{n} lo g max {\frac{σ _{2} ( A _{i_{n}} \dots A _{i_{1}} )}{σ _{1} ( A _{i_{n}} \dots A _{i_{1}} )} : 1 \leq i_{1}, \dots, i_{n} \leq N} < 0

t_{n} := ∣ i ∣ = n \sum ρ (A_{i})^{p} j = 2 \prod d (1 - \frac{λ _{j} ( A _{i} )}{λ _{1} ( A _{i} )})^{- 1} = ∣ i ∣ = n \sum \frac{λ _{1} ( A _{i} ) ^{d - 1} ρ ( A _{i} ) ^{p}}{p _{A_{i}}^{'} ( λ _{1} ( A _{i} ))}

t_{n} := ∣ i ∣ = n \sum ρ (A_{i})^{p} j = 2 \prod d (1 - \frac{λ _{j} ( A _{i} )}{λ _{1} ( A _{i} )})^{- 1} = ∣ i ∣ = n \sum \frac{λ _{1} ( A _{i} ) ^{d - 1} ρ ( A _{i} ) ^{p}}{p _{A_{i}}^{'} ( λ _{1} ( A _{i} ))}

a_{n}

a_{n}

= k = 1 \sum n \frac{( - 1 ) ^{k}}{k !} n_{1}, \dots, n_{k} \geq 1 n_{1} + \dots + n_{k} = n \sum ℓ = 1 \prod k \frac{t _{n_{ℓ}}}{n _{ℓ}}

ϱ_{p} (A_{1}, \dots, A_{N}) - \frac{1}{r _{n}} \leq K exp (- γ n^{\frac{d}{d - 1}}) .

ϱ_{p} (A_{1}, \dots, A_{N}) - \frac{1}{r _{n}} \leq K exp (- γ n^{\frac{d}{d - 1}}) .

1 - lo g_{2} (2^{- \frac{1}{p}} ϱ_{p} (A_{1}, A_{2})^{\frac{1}{p}}) = \frac{p + 1}{p} - \frac{1}{p} lo g_{2} ϱ_{p} (A_{1}, A_{2})

1 - lo g_{2} (2^{- \frac{1}{p}} ϱ_{p} (A_{1}, A_{2})^{\frac{1}{p}}) = \frac{p + 1}{p} - \frac{1}{p} lo g_{2} ϱ_{p} (A_{1}, A_{2})

A_{1} := (\frac{1}{5} \frac{1}{5} 0 \frac{3}{5}), A_{2} := (\frac{3}{5} 0 \frac{1}{5} \frac{1}{5})

A_{1} := (\frac{1}{5} \frac{1}{5} 0 \frac{3}{5}), A_{2} := (\frac{3}{5} 0 \frac{1}{5} \frac{1}{5})

1.953821293179325866750389914731492551138280064126997 \dots

1.953821293179325866750389914731492551138280064126997 \dots

n \to \infty lim (i_{1}, \dots, i_{n} = 1 \sum N p_{i_{n}} \dots p_{i_{1}} ∥ A_{i_{n}} \dots A_{i_{1}} ∥^{p})^{\frac{1}{n}}

n \to \infty lim (i_{1}, \dots, i_{n} = 1 \sum N p_{i_{n}} \dots p_{i_{1}} ∥ A_{i_{n}} \dots A_{i_{1}} ∥^{p})^{\frac{1}{n}}

t_{n} := ∣ i ∣ = n \sum p_{i} ρ (A_{i})^{p} j = 2 \prod d (1 - \frac{λ _{j} ( A _{i} )}{λ _{1} ( A _{i} )})^{- 1} = ∣ i ∣ = n \sum p_{i} \frac{λ _{1} ( A _{i} ) ^{d - 1} ρ ( A _{i} ) ^{p}}{p _{A_{i}}^{'} ( λ _{1} ( A _{i} ))}

t_{n} := ∣ i ∣ = n \sum p_{i} ρ (A_{i})^{p} j = 2 \prod d (1 - \frac{λ _{j} ( A _{i} )}{λ _{1} ( A _{i} )})^{- 1} = ∣ i ∣ = n \sum p_{i} \frac{λ _{1} ( A _{i} ) ^{d - 1} ρ ( A _{i} ) ^{p}}{p _{A_{i}}^{'} ( λ _{1} ( A _{i} ))}

n \to \infty lim ∣ i ∣ = n max ∥ A_{i} ∥^{\frac{1}{n}}

n \to \infty lim ∣ i ∣ = n max ∥ A_{i} ∥^{\frac{1}{n}}

ϱ_{p} (A_{1}, \dots, A_{N}) = n \to \infty lim ∣ i ∣ = n \sum ∥ A_{i} ∥^{p}^{\frac{1}{n}} = n \geq 1 in f ∣ i ∣ = n \sum ∥ A_{i} ∥^{p}^{\frac{1}{n}},

ϱ_{p} (A_{1}, \dots, A_{N}) = n \to \infty lim ∣ i ∣ = n \sum ∥ A_{i} ∥^{p}^{\frac{1}{n}} = n \geq 1 in f ∣ i ∣ = n \sum ∥ A_{i} ∥^{p}^{\frac{1}{n}},

∣ i ∣ = m + n \sum ∥ A_{i} ∥^{p} \leq ∣ i ∣ = m \sum ∥ A_{i} ∥^{p} ∣ i ∣ = n \sum ∥ A_{i} ∥^{p} .

∣ i ∣ = m + n \sum ∥ A_{i} ∥^{p} \leq ∣ i ∣ = m \sum ∥ A_{i} ∥^{p} ∣ i ∣ = n \sum ∥ A_{i} ∥^{p} .

ϱ_{p} (A_{1}, \dots, A_{N})

ϱ_{p} (A_{1}, \dots, A_{N})

= n \geq 1 sup \frac{\sum _{∣ i ∣ = n d} ∥ A _{i} ∥ ^{p}}{K ( p , d ) ( \sum _{∣ i ∣ = n} ∥ A _{i} ∥ ^{p} ) ^{d - 1}}^{\frac{1}{n}}

ϱ_{p} (A_{1}, \dots, A_{N}) = ρ (i = 1 \sum N A_{i}^{\otimes p})

ϱ_{p} (A_{1}, \dots, A_{N}) = ρ (i = 1 \sum N A_{i}^{\otimes p})

ϱ_{p} (A_{1}, \dots, A_{N}) \leq ρ (i = 1 \sum N A_{i}^{\otimes p})

ϱ_{p} (A_{1}, \dots, A_{N}) \leq ρ (i = 1 \sum N A_{i}^{\otimes p})

∣ i ∣ = n \sum ∥ A_{i} ∥^{λ p_{1} + (1 - λ) p_{2}} \leq ∣ i ∣ = n \sum ∥ A_{i} ∥^{p_{1}}^{λ} ∣ i ∣ = n \sum ∥ A_{i} ∥^{p_{2}}^{1 - λ}

∣ i ∣ = n \sum ∥ A_{i} ∥^{λ p_{1} + (1 - λ) p_{2}} \leq ∣ i ∣ = n \sum ∥ A_{i} ∥^{p_{1}}^{λ} ∣ i ∣ = n \sum ∥ A_{i} ∥^{p_{2}}^{1 - λ}

lo g ϱ_{λ p_{1} + (1 - λ) p_{2}} (A_{1}, \dots, A_{N}) \leq λ lo g ϱ_{p_{1}} (A_{1}, \dots, A_{N}) + (1 - λ) lo g ϱ_{p_{2}} (A_{1}, \dots, A_{N})

lo g ϱ_{λ p_{1} + (1 - λ) p_{2}} (A_{1}, \dots, A_{N}) \leq λ lo g ϱ_{p_{1}} (A_{1}, \dots, A_{N}) + (1 - λ) lo g ϱ_{p_{2}} (A_{1}, \dots, A_{N})

ϱ_{p} (A_{1}, \dots, A_{N})

ϱ_{p} (A_{1}, \dots, A_{N})

\leq ρ (i = 1 \sum N A_{i}^{\otimes ⌊ p ⌋})^{p - ⌊ p ⌋} ρ (i = 1 \sum N A_{i}^{\otimes (1 + ⌊ p ⌋)})^{1 + ⌊ p ⌋ - p}

a_{p} (n) = (u_{1}, \dots, u_{d}) \in R^{d} in f ∣ i ∣ = n \sum (1 \leq i \leq d max j = 1 \sum d (A_{i})_{ij} e^{u_{j} - u_{i}})^{p},

a_{p} (n) = (u_{1}, \dots, u_{d}) \in R^{d} in f ∣ i ∣ = n \sum (1 \leq i \leq d max j = 1 \sum d (A_{i})_{ij} e^{u_{j} - u_{i}})^{p},

b_{p} (n) = (v_{1}, \dots, v_{d}) \in R^{d} in f 1 \leq j \leq d max ∣ i ∣ = n \sum (i = 1 \sum d (A_{i})_{ij} e^{v_{i} - v_{j}})^{p},

b_{p} (n) = (v_{1}, \dots, v_{d}) \in R^{d} in f 1 \leq j \leq d max ∣ i ∣ = n \sum (i = 1 \sum d (A_{i})_{ij} e^{v_{i} - v_{j}})^{p},

max {d^{- \frac{p}{n}} a_{p} (n)^{\frac{1}{n}}, d^{\frac{1 - p}{n}} b_{p} (n)^{\frac{1}{n}}} \leq ϱ_{p} (A_{1}, \dots, A_{N}) \leq b_{p} (n)^{\frac{1}{n}}

max {d^{- \frac{p}{n}} a_{p} (n)^{\frac{1}{n}}, d^{\frac{1 - p}{n}} b_{p} (n)^{\frac{1}{n}}} \leq ϱ_{p} (A_{1}, \dots, A_{N}) \leq b_{p} (n)^{\frac{1}{n}}

(L_{p} f) (\overline{u}) := i = 1 \sum N (\frac{∥ A _{i} u ∥}{∥ u ∥})^{p} f (\overline{A_{i} u})

(L_{p} f) (\overline{u}) := i = 1 \sum N (\frac{∥ A _{i} u ∥}{∥ u ∥})^{p} f (\overline{A_{i} u})

(L_{p}^{n} f) (\overline{u}) = ∣ i ∣ = n \sum (\frac{∥ A _{i} u ∥}{∥ u ∥})^{p} f (\overline{A_{i} u})

(L_{p}^{n} f) (\overline{u}) = ∣ i ∣ = n \sum (\frac{∥ A _{i} u ∥}{∥ u ∥})^{p} f (\overline{A_{i} u})

n \to \infty lim L_{p}^{n}^{\frac{1}{n}} = n \to \infty lim ∣ i ∣ = n \sum ∥ A_{i} ∥^{p}^{\frac{1}{n}},

n \to \infty lim L_{p}^{n}^{\frac{1}{n}} = n \to \infty lim ∣ i ∣ = n \sum ∥ A_{i} ∥^{p}^{\frac{1}{n}},

s_{n} (L) := in f {∥ L - F ∥ : rank F < n}

s_{n} (L) := in f {∥ L - F ∥ : rank F < n}

det (I - z L) := n = 1 \prod M (1 - z λ_{k}),

det (I - z L) := n = 1 \prod M (1 - z λ_{k}),

a_{n} = i_{1} < i_{2} < \dots < i_{n} \sum λ_{i_{1}} \dots λ_{i_{n}} = \frac{1}{n !} n_{1} + \dots + n_{k} = n \sum i = 1 \prod k (- \frac{tr L ^{n_{i}}}{n _{i}})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Dynamics and Fractals · Mathematical Analysis and Transform Methods · Matrix Theory and Algorithms

Full text

Fast approximation of the $p$ -radius, matrix pressure or generalised Lyapunov exponent for positive and dominated matrices

Ian D. Morris

Abstract.

If $A_{1},\ldots,A_{N}$ are real $d\times d$ matrices then the $p$ -radius, generalised Lyapunov exponent or matrix pressure is defined to be the asymptotic exponential growth rate of the sum $\sum_{i_{1},\ldots,i_{n}=1}^{N}\|A_{i_{n}}\cdots A_{i_{1}}\|^{p}$ , where $p$ is a real parameter. Under its various names this quantity has been investigated for its applications to topics including wavelet regularity and refinement equations, fractal geometry and the large deviations theory of random matrix products. In this article we present a new algorithm for computing the $p$ -radius under the hypothesis that the matrices are all positive, or more generally under the hypothesis that they satisfy a weaker condition called domination. This algorithm is based on interpreting the $p$ -radius as the leading eigenvalue of a trace-class operator on a Hilbert space and estimating that eigenvalue via approximations to the Fredholm determinant of the operator. In this respect our method is closely related to the work of Z.-Q. Bai and M. Pollicott on computing the top Lyapunov exponent of a random matrix product. For pairs of positive matrices of low dimension our method yields substantial improvements over existing methods.

1. Introduction

If $(A_{1},\ldots,A_{N})$ is a tuple of real $d\times d$ matrices and $p\in\mathbb{R}$ a real parameter, the limit

[TABLE]

exists by applying Fekete’s subadditivity lemma to the sequence

[TABLE]

if $p\geq 0$ , or to the sequence $-a_{n}(p)$ if $p<0$ . The quantity (1), modulo some trivial variations in its definition, has been studied independently in at least three different contexts and literatures: under the name of generalised Lyapunov exponent the quantity $\log(N^{-1}\varrho_{p}(A_{1},\ldots,A_{N}))$ has been studied for $p\in\mathbb{R}$ in [9, 41] where its investigation is motivated by the large deviations theory of random matrix products in statistical mechanics; under the name of matrix pressure, the quantity $\varrho_{p}(A_{1},\ldots,A_{N})$ has been investigated for $p\geq 0$ in the fractal geometry literature in view of its applications to the dimension of self-similar and self-affine limit sets ([11, 12, 13, 27, 34]); and in the joint spectral radius literature, the quantity $N^{-1/p}\varrho_{p}(A_{1},\ldots,A_{N})^{1/p}$ has been investigated for $p\geq 1$ in connection with its applications to wavelet regularity [8, 26, 42] and the control theory of discrete linear inclusions [22, 30]. Across all three literatures there has arisen the problem of computing or estimating the quantity $\varrho_{p}(A_{1},\ldots,A_{N})$ – as may be seen for example in [23, 27, 31, 34, 36, 39, 41] – and it is with this that the present article is concerned. The principal result of this article is a new algorithm for the computation of $\varrho_{p}(A_{1},\ldots,A_{N})$ in the case where the matrices $A_{1},\ldots,A_{N}$ are positive and $p$ is an arbitrary real number. More generally, our method extends to the case where the matrices $A_{1},\ldots,A_{N}$ strictly preserve a cone or multicone.

2. Statement of main result

In order to state our result let us present the definition of a multicone. Let us say that a cone in $\mathbb{R}^{d}$ is a set $\mathcal{K}\subset\mathbb{R}^{d}$ which is closed, convex, has nonempty interior, satisfies $\lambda\mathcal{K}=\lambda\mathcal{K}$ for all real $\lambda>0$ and satisfies $\mathcal{K}\cap-\mathcal{K}=\{0\}$ . A multicone will be a tuple $(\mathcal{K}_{1},\ldots,\mathcal{K}_{m})$ of cones in $\mathbb{R}^{d}$ such that for some nonzero vector $w\in\mathbb{R}^{d}$ we have $\langle u,w\rangle>0$ for all nonzero $u\in\bigcup_{j=1}^{m}\mathcal{K}_{j}$ , and such that $\mathcal{K}_{i}\cap\mathcal{K}_{j}=\{0\}$ for distinct $i,j\in\{1,\ldots,m\}$ . The vector $w$ is called the transverse-defining vector of the multicone. We say that a matrix $A\in M_{d}(\mathbb{R})$ strictly preserves a cone $\mathcal{K}$ if $A(\mathcal{K}\setminus\{0\})\subseteq\mathrm{Int}\,\mathcal{K}$ , and we say that $A$ strictly preserves a multicone $(\mathcal{K}_{1},\ldots,\mathcal{K}_{m})$ if for every $i=1,\ldots,m$ we have $A(\mathcal{K}_{i}\setminus\{0\})\subseteq\operatorname{Int}\mathcal{K}_{j}\cup(-\operatorname{Int}\mathcal{K}_{j})$ for some $j\in\{1,\ldots,m\}$ depending on $i$ . If $A$ strictly preserves a multicone then a simple pigeonhole argument demonstrates that some power of $A$ strictly preserves a cone, which implies that $A$ has a simple leading eigenvalue (which might be either positive or negative). We say that $(A_{1},\ldots,A_{N})\in M_{d}(\mathbb{R})^{N}$ strictly preserves a multicone $(\mathcal{K}_{1},\ldots,\mathcal{K}_{m})$ if every $A_{i}$ strictly preserves that multicone. We say that $(A_{1},\ldots,A_{N})$ is multipositive if there exists a multicone which is strictly preserved by $(A_{1},\ldots,A_{N})$ . The property of multipositivity admits characterisations which do not overtly refer to cones or multicones: for example, if $(A_{1},\ldots,A_{N})\in M_{d}(\mathbb{R})^{N}$ is a tuple of invertible matrices then the multipositivity of $(A_{1},\ldots,A_{N})$ is equivalent to the condition

[TABLE]

where $\sigma_{k}(A)$ denotes the $k^{\mathrm{th}}$ singular value of the matrix $A$ , see for example [4, 5, 29]. The condition (2) above is sometimes called $1$ -domination or simply domination and has been explored in some detail in the dynamical systems literature [1, 5]; its applications to certain numerical invariants of sets of matrices have been investigated in such works as [6, 7].

For each $N\geq 1$ we let $\Sigma_{N}^{*}$ denote the set of all finite sequences $\mathtt{i}=(i_{1},\ldots,i_{n})$ such that $i_{1},\ldots,i_{n}$ are integers between $1$ and $N$ . If a tuple of matrices $(A_{1},\ldots,A_{N})\in M_{d}(\mathbb{R})^{N}$ is understood, given $\mathtt{i}=(i_{1},\ldots,i_{n})\in\Sigma_{N}^{*}$ we define $A_{\mathtt{i}}:=A_{i_{n}}\cdots A_{i_{1}}.$ If $\mathtt{i}=(i_{1},\ldots,i_{n})\in\Sigma_{N}^{*}$ then we write $|\mathtt{i}|:=n$ and call this the length of $\mathtt{i}$ . Finally we let $\rho(A)$ denote the spectral radius of the matrix $A$ , and we let $\lambda_{1}(A),\ldots,\lambda_{d}(A)$ denote the eigenvalues of $A$ listed in decreasing order of absolute value. Since our matrices $A$ will always strictly preserve a multicone the largest eigenvalue of $A$ will always be unique and the definition of $\lambda_{1}(A)$ unambiguous.

We may now state the principal result of this article, which is the following:

Theorem 1.

Let $(A_{1},\ldots,A_{N})\in M_{d}(\mathbb{R})^{N}$ be multipositive, where $N,d\geq 2$ , and let $p\in\mathbb{R}$ . For every $n\geq 1$ define

[TABLE]

where $p_{B}(x):=\det(xI-B)$ denotes the characteristic polynomial of the matrix $B$ and $p_{B}^{\prime}(x_{0})$ its first derivative evaluated at $x_{0}$ . Define $a_{0}:=1$ and

[TABLE]

for every $n\geq 1$ . Then for all sufficiently large $n$ there exists a smallest positive real root $r_{n}>0$ of the polynomial $\sum_{k=0}^{n}a_{k}x^{k}$ , and there exist constants $K,\gamma>0$ such that for all large enough integers $n$

[TABLE]

Theorem 1 applies in particular if the matrices $A_{i}$ are all positive matrices, or if the matrices $A_{i}$ all strictly preserve a single cone $\mathcal{K}$ . However, multipositive matrix tuples with neither of these two properties also exist: see [1]. We remark that since $\varrho_{p}(A_{1},\ldots,A_{N})=\varrho_{p}(X^{-1}A_{1}X,\ldots,X^{-1}A_{N}X)$ for every invertible matrix $X$ , a sufficient condition for the application of Theorem 1 is that the matrices $A_{i}$ be simultaneously conjugate to positive matrices.

The reader will notice that the order of convergence in Theorem 1 is strongest when the dimension of the matrix is $2$ and becomes weaker as the dimension is increased, although it is in all cases super-exponential in $n$ . The problem of estimating the implied constants $K$ and $\gamma$ in (3) is not attempted in this article; we believe that in the case of tuples of positive matrices this should be feasible in principle, but would rely on difficult functional-analytic estimates such as an a priori bound for the cardinality of the relative covers arising in the application of [3, Theorem 4.7] to certain complex cones. In any event, convergence in Theorem 1 is fast enough to yield significant results in low dimensions. In the previous work [22], R. Jungers and V. Yu. Protasov investigated the problem of computing what in our notation corresponds to the quantity

[TABLE]

for the pair of matrices

[TABLE]

with $p:=3.5$ , obtaining an estimate of $1.95\leq\frac{p+1}{p}-\frac{1}{p}\log_{2}\varrho_{p}(A_{1},A_{2})\leq 1.973$ . It happens that the pair $(A_{1},A_{2})$ is simultaneously conjugate to a pair of positive matrices; taking $n:=20$ in Theorem 1 yields the estimate

[TABLE]

for the same quantity, which is empirically accurate to all decimal places shown.

We remark that in the literature on the generalised Lyapunov exponent, it is common to consider the quantity

[TABLE]

in place of the quantity $\varrho_{p}(A_{1},\ldots,A_{N})$ as defined in (1), where $(p_{1},\ldots,p_{N})$ is a probability vector. The quantity (5) can easily be included within the scope of (1) and Theorem 1 by replacing each instance of a matrix $A_{i}$ with the corresponding matrix $p_{i}^{1/p}A_{i}$ . Concretely, this implies that the quantity (5) can be calculated using Theorem 1 by taking instead

[TABLE]

where $p_{\mathtt{i}}:=p_{i_{1}}\cdots p_{i_{n}}$ , and leaving the rest of the theorem unchanged. For the remainder of the article we therefore ignore the issue of giving a probability weighting to each $A_{i}$ and concentrate on the calculation of the $p$ -radius as defined in (1).

It is possible to show that the quantities $t_{n}$ defined in Theorem 1 satisfy $\lim_{n\to\infty}t_{n}\varrho_{p}(A_{1},\ldots,A_{N})^{-n}=1$ and therefore increase (or decrease) exponentially with $n$ . The efficiency of the estimate in Theorem 1 on the other hand relies on the quantities $a_{n}$ decreasing as $O(\exp(-\gamma n^{\frac{d}{d-1}}))$ . The small size of the quantities $a_{n}$ thus arises from additive cancellation among the relatively large terms in the sum defining each $a_{n}$ . In practical applications it is therefore important to compute the quantities $t_{n}$ to a precision exceeding that desired for the approximation to $\varrho_{p}(A_{1},\ldots,A_{N})$ .

The remainder of this article is structured as follows. In §3 below we review the fundamental properties of $\varrho_{p}$ and describe some existing techniques for its estimation. In §4 we describe in outline the techniques underlying the proof of Theorem 1 and in §5 the proof itself is presented. In §6 we present some examples of the computation of $\varrho_{p}$ using the algorithms described herein.

3. Methods for estimating the $p$ -radius

3.1. Fundamental estimates

If $(A_{1},\ldots,A_{N})\in M_{d}(\mathbb{R})^{N}$ and $p\in\mathbb{R}$ then by elementary estimates it follows that $\varrho_{p}(A_{1},\ldots,A_{N})=0$ if and only if the joint spectral radius

[TABLE]

is zero. It is well known that the joint spectral radius is zero if and only if all of the products $A_{i_{d}}\cdots A_{i_{1}}$ of length $d$ are zero, if and only if there exists a basis in which all of the matrices $A_{1},\ldots,A_{N}$ are simultaneously upper triangular with all diagonal entries equal to zero (for details see [21, §2.3.1]). Since the theory of the $p$ -radius is trivial in this situation we will for the remainder of this paper deal only with matrices for which the $p$ -radius is assumed to be nonzero. We remark that in the multipositive case considered in Theorem 1 every product $A_{\mathtt{i}}$ has a simple leading eigenvalue and in particular is not the zero matrix, so in this case $\varrho_{p}(A_{1},\ldots,A_{N})$ is guaranteed to be nonzero.

When $p>0$ the $p$ -radius admits an elementary description as the limit of a convergent sequence of upper bounds,

[TABLE]

as a consequence of the submultiplicativity relation

[TABLE]

Less trivially, when $p>0$ it may also be expressed as the limit of a convergent sequence of lower bounds:

[TABLE]

where $K(p,d):=d^{2+(d+1)p}\max\{d^{1-p},1\}$ , see [27, Theorem 1.2]. In particular the $p$ -radius can in principle be approximated to within any prescribed error $\varepsilon$ by systematically computing the upper and lower bounds until they eventually agree to within the prescribed amount. However, since the computational effort involved increases exponentially with $n$ and the relative error may reasonably be presumed to be at least of the order of $K(p,d)^{1/n}$ , and since the constant $K(p,d)$ is relatively large even in the case $d=2$ , this procedure seems unlikely to have any value for practical computations. An illustration of this is presented in §6 below. We remark that an additional theoretical consequence of the above expressions is that the $p$ -radius varies continuously both in $p$ and in the matrix entries when $p$ is positive, since it is then equal to both an upper and a lower pointwise limit of sequences of continuous functions, hence continuous. When $p<0$ the computability and continuity of the $p$ -radius do not seem to have been as thoroughly investigated, but based on the related works [6, 28, 40] it seems likely that continuity should not hold and that systematic upper and lower estimation might be infeasible, at least when the matrices are not assumed to be positive or invertible.

When $p$ is a positive even integer, or when $p$ is a positive integer and the matrices $A_{1},\ldots,A_{N}$ preserve a cone, the identity

[TABLE]

has been discovered independently on several occasions [10, 36, 43]. (Here $A^{\otimes p}$ denotes the $p^{\mathrm{th}}$ Kronecker power of the matrix $A$ , see for example [17, §4.2].) When $p$ is a positive integer and $A_{1},\ldots,A_{N}$ are not necessarily positive, the inequality

[TABLE]

may be obtained by the same means. Whilst in principle (8) represents an easy method for computing the $p$ -radius of positive matrices, the size of the auxiliary matrix $\sum_{i=1}^{N}A_{i}^{\otimes p}$ increases exponentially with $p$ which prevents the use of the formula when $p$ is sufficiently large. For non-integer $p$ these results may nonetheless be exploited so as to yield upper bounds as follows. We observe that if $p_{1}$ and $p_{2}$ are real numbers such that $0<p_{1}<p_{2}$ , and $\lambda\in(0,1)$ , then for each $n\geq 1$

[TABLE]

using Hölder’s inequality with $p:=\frac{1}{\lambda}$ and $q:=\frac{1}{1-\lambda}$ . It follows easily that

[TABLE]

and hence the function $p\mapsto\log\varrho_{p}(A_{1},\ldots,A_{N})$ is convex. This yields the upper bound

[TABLE]

valid for all $p>0$ and $A_{1},\ldots,A_{N}\in M_{d}(\mathbb{R})$ , which does not seem to have been previously noted in the literature. We will see in §6 below that despite its crudity this estimate does not automatically provide a bad approximation and should not be discounted out of hand.

3.2. Resampled Monte Carlo methods

In [41], J. Vanneste introduced a method based on the interpretation of the $p$ -radius as an asymptotic moment of a random matrix product: given $A_{1},\ldots,A_{N}\in M_{d}(\mathbb{R})$ , $n\geq 1$ and $p\in\mathbb{R}$ we may view the sum $\frac{1}{N^{n}}\sum_{|\mathtt{i}|=n}\|A_{\mathtt{i}}\|^{p}$ as the expectation of the random variable $\mathtt{i}\mapsto\|A_{\mathtt{i}}\|^{p}$ where each word $\mathtt{i}$ of length $n$ is chosen with probability $1/N^{n}$ . This suggests the possibility of approximating $\frac{1}{N^{n}}\sum_{|\mathtt{i}|=n}\|A_{\mathtt{i}}\|^{p}$ for large $n$ by Monte Carlo estimation: if we choose $M$ words $\mathtt{i}_{1},\ldots,\mathtt{i}_{M}$ independently then by the law of large numbers, the average $\frac{1}{M}\sum_{k=1}^{M}\|A_{\mathtt{i}_{k}}\|^{p}$ should for large enough $M$ give a reasonable approximation to the value $\frac{1}{N^{n}}\sum_{|\mathtt{i}|=n}\|A_{\mathtt{i}}\|^{p}$ which is that random variable’s expectation and hence a good approximation to $\varrho_{p}(A_{1},\ldots,A_{N})$ as long as $n$ is reasonably large. However, except which $p$ is small, the variance of this random variable will be prohibitively large – indeed exponentially large in $n$ – which makes convergence in the strong law of large numbers unreasonably slow. To compensate for this Vanneste introduced a “go-with-the-winners” resampling scheme along the lines of [15], which successively modifies the distribution of the random variable $\mathtt{i}\mapsto\|A_{\mathtt{i}}\|^{p}$ so as to retain the same mean while reducing the variance; see discussion in [41, §III] for details. The particular strength of this method is that it has very limited dependence on the number of matrices and their dimension; on the other hand, the accuracy of the results is relatively low in practice. See §6 below for further discussion.

3.3. The convex optimisation bounds of Jungers and Protasov

The article [23] introduced new systematic upper and lower bounds for the $p$ -radius in the case $p\geq 1$ . If $(A_{1},\ldots,A_{N})$ are non-negative matrices, Jungers and Protasov showed that the quantities

[TABLE]

where $(B)_{ij}$ denotes the $(i,j)$ entry of the matrix $B\in M_{d}(\mathbb{R})$ , satisfy

[TABLE]

for every $n\geq 1$ . (Here we have modified the statement of their results in concordance with our definition of $\varrho_{p}$ .) The quantities $\mathsf{a}_{p}(n)$ and $\mathsf{b}_{p}(n)$ are solutions to convex optimisation problems and as such may be efficiently approximated. In the case where $(A_{1},\ldots,A_{N})$ preserves a more general cone $\mathcal{K}$ (in the weak sense that $A_{i}\mathcal{K}\subseteq\mathcal{K}$ for each $i=1,\ldots,N$ ) analogous upper and lower bounds are given, but these are not in general the solutions to convex optimisation problems and as such are more difficult to efficiently or rigorously estimate. Since the matrices $A_{1}^{\otimes 2},\ldots,A_{N}^{\otimes 2}$ always preserve a cone irrespective of the structure of the original matrices $A_{1},\ldots,A_{N}$ , and since $\varrho_{p}(A_{1},\ldots,A_{N})=\varrho_{p/2}(A_{1}^{\otimes 2},\ldots,A_{N}^{\otimes 2})$ for all $p\in\mathbb{R}$ , this more general version of their method permits the estimation of $\varrho_{p}(A_{1},\ldots,A_{N})$ for arbitrary $A_{1},\ldots,A_{N}\in M_{d}(\mathbb{R})$ and $p\geq 2$ .

As with the upper and lower bounds (6) and (7) this system of estimation requires the computation of $N^{n}$ matrix products in order to obtain the $n^{\mathrm{th}}$ approximation and as such is best suited to cases in which $N$ is small.

3.4. Eigenvalue methods

As has been previously observed by J. Vanneste [41, §II.B], the quantity $\varrho_{p}(A_{1},\ldots,A_{N})$ can be represented as the leading eigenvalue of a linear operator on an infinite-dimensional function space in the following manner. Suppose that $A_{1},\ldots,A_{N}\in M_{d}(\mathbb{R})$ are invertible matrices and let $p\in\mathbb{R}$ . Let $\mathbb{RP}^{d-1}$ denote the space of lines through the origin in $\mathbb{R}^{d}$ , with the distance between two lines defined to be the angle at which they intersect. For each nonzero $u\in\mathbb{R}^{d}$ let $\overline{u}\in\mathbb{RP}^{d-1}$ denote the line spanned by $u$ . Define an operator on the space $C^{\alpha}(\mathbb{RP}^{d-1})$ of $\alpha$ -Hölder continuous functions $f\colon\mathbb{RP}^{d-1}\to\mathbb{R}$ by

[TABLE]

and observe that by a simple calculation

[TABLE]

for every $n\geq 1$ , $f\in C^{\alpha}(\mathbb{RP}^{d-1})$ and $\overline{u}\in\mathbb{RP}^{d-1}$ . With only a little more work one may show that in fact

[TABLE]

and under mild algebraic non-degeneracy conditions on the matrices $A_{i}$ , a rather longer argument shows that $\varrho_{p}(A_{1},\ldots,A_{N})$ is the largest eigenvalue of $\mathcal{L}_{p}$ acting on $C^{\alpha}(\mathbb{RP}^{d-1})$ if $\alpha>0$ is chosen sufficiently small (see for example [16, Théorème 8.8]). This suggests the idea of calculating $\varrho_{p}(A_{1},\ldots,A_{N})$ by approximating the operator $\mathcal{L}_{p}$ with a large matrix representing the action of the matrices $A_{i}$ on a discretised version of $\mathbb{RP}^{d-1}$ . This approach was previously described in [41, §IV.A] but does not seem to have been investigated in detail. A version of this method was also suggested in [29, §8] for the purpose of estimating the Hausdorff dimensions of some self-affine limit sets.

To give a concrete example, in the case $d=2$ write $u(\theta):=(\cos\theta,\sin\theta)$ for each $\theta\in[0,\pi)$ and for $\overline{u},\overline{v}$ let $[\overline{u},\overline{v})$ denote the shorter of the two arcs in $\mathbb{RP}^{1}$ from $\overline{u}$ to $\overline{v}$ , including the former endpoint but not the latter. Fix an integer $n\geq 1$ . For each $i=1,\ldots,N$ define an $n\times n$ matrix $B_{i}=[b_{jk}^{(i)}]_{j,k=0}^{n-1}$ by $b_{jk}^{(i)}:=\|A_{i}u(j\pi/n)\|^{p}$ if $\overline{Au(j\pi/n)}\in[\overline{u(k\pi/n)},\overline{u((k+1)\pi/n)})$ and $b_{jk}^{(i)}:=0$ otherwise. Define now the matrix $B:=\sum_{i=1}^{N}B_{i}$ . Since $B$ corresponds to a version of $\mathcal{L}_{p}$ acting on functions defined on a discretisation of $\mathbb{RP}^{1}$ into $n$ evenly-spaced points, we expect that for large $n$ the spectral radius of $B$ should give a reasonable approximation to $\rho(\mathcal{L}_{p})=\varrho_{p}(A_{1},\ldots,A_{N})$ . In principle it may be possible to demonstrate this rigorously using the methods of [25], but this does not seem to have so far been attempted in the literature and is certainly a problem beyond the scope of this article.

For two-dimensional matrices this method appears to yield approximations accurate to several decimal places in a tolerable amount of time (see §6 below) and it is apparent from the definition that the effect of increasing the number of matrices $N$ has at worst a polynomial effect on the running time of the algorithm. However the size of the matrix required in order to discretise $\mathbb{RP}^{d-1}$ into a mesh of prescribed size $\varepsilon$ rises exponentially with the dimension $d$ , suggesting that this method is unlikely to be very useful for matrices which are not of low dimension. The question also arises of whether better estimates may be obtained by adapting the mesh locally so as to include more mesh points in regions where the derivative of one of the maps $\overline{u}\mapsto\overline{A_{i}u}$ is large and fewer mesh points where it is small. Since the principal purpose of this article is to introduce the new algorithm given by Theorem 1, we leave these questions to other investigators.

4. Overview of the proof of Theorem 1

In the previous subsection we observed that $\varrho_{p}(A_{1},\ldots,A_{N})$ admits an interpretation as the leading eigenvalue of a linear operator on an infinite-dimensional function space and considered the possibility of approximating such an operator directly by operators on finite-dimensional spaces. This is however not the only mechanism by which the leading eigenvalue of an operator may be calculated. In order to describe our chosen alternative we will briefly and informally review some concepts from the theory of trace-class linear operators; thorough formal treatments of this topic may be found in e.g. [14, 38].

If an operator $\mathscr{L}$ on an infinite-dimensional Hilbert space has the property that the sequence of approximation numbers

[TABLE]

is summable then it is called trace-class. If this is the case then $\mathscr{L}$ is a compact operator (since it is a limit in the norm topology of a sequence of finite-rank operators) and therefore its spectrum consists of [math] together with a finite or infinite set of eigenvalues, each of finite algebraic multiplicity, which has no nonzero accumulation points. It is not difficult to see that $\mathfrak{s}_{n}(\mathscr{L}^{k})\leq\|\mathscr{L}^{k-1}\|\mathfrak{s}_{n}(\mathscr{L})$ for every $k,n\geq 1$ by direct manipulation of the definition and consequently every power of a trace-class operator is also trace-class. If $\mathscr{L}$ is a trace-class operator on $\mathscr{H}$ with finite or infinite sequence of nonzero eigenvalues $(\lambda_{n})_{n=1}^{M}$ , it is classical that the series $\sum_{n=1}^{M}\lambda_{n}$ converges absolutely to a quantity which is called the trace of $\mathscr{L}$ and denoted $\operatorname{tr}\mathscr{L}$ . Moreover the quantity

[TABLE]

called the Fredholm determinant of $\mathscr{L}$ , defines an entire holomorphic function in the variable $z$ with power series $\sum_{n=0}^{\infty}a_{n}z^{n}$ , say. It is also classical that in this case the zeros of $z\mapsto\det(I-z\mathscr{L})$ are precisely the reciprocals of the nonzero eigenvalues of $\mathscr{L}$ and that additionally

[TABLE]

for every $n\geq 1$ , where $a_{0}:=1$ and where $\lambda_{k}$ is interpreted as zero if $k>M$ . It follows that if the traces $\operatorname{tr}\mathscr{L}^{k}$ can be easily calculated for $k=1,\ldots,n$ , say, then an approximation $\sum_{k=0}^{n}a_{k}z^{k}$ to the Fredholm determinant can be constructed using (10) and it might be hoped that the smallest positive real root of the polynomial $\sum_{k=0}^{n}a_{k}z^{k}$ would provide a good estimate for the reciprocal of the leading eigenvalue $\rho(\mathscr{L})$ of $\mathscr{L}$ as long as the remainder $\sum_{k=n+1}^{\infty}a_{k}z^{k}$ is extremely small. In view of the equation (10) it follows that if the sequence $(\lambda_{n})_{n=1}^{M}$ can be shown to decay stretched-exponentially then this remainder will in fact be super-exponentially small, and this is indeed the approach which we will take in estimating $\varrho_{p}(A_{1},\ldots,A_{N})$ . This general approach to estimating dynamical quantities via operator eigenvalues has been previously exploited in a number of prior articles of which we note [2, 18, 19, 29, 32, 33, 35].

The proof of Theorem 1 therefore proceeds via the introduction of a trace-class operator $\mathscr{L}$ on a Hilbert space $\mathscr{H}$ with the properties required by the argument sketched above: a stretched-exponential estimate on the singular values $\mathfrak{s}_{n}(\mathscr{L})$ (which implies a stretched-exponential estimate on the eigenvalues via Weyl’s inequality), the property $\rho(\mathscr{L})=\varrho_{p}(A_{1},\ldots,A_{N})$ , and a simple, computationally-feasible formula for the sequence of traces $\operatorname{tr}\mathscr{L}^{n}$ . The following result from [29] saves us the necessity of constructing such an operator from first principles:

Theorem 2 ([29, Corollary 5.1]).

Let $d,N\geq 2$ , let $A_{1},\ldots,A_{N}$ be real $d\times d$ matrices and suppose that $(\mathcal{K}_{1},\ldots,\mathcal{K}_{m})$ is a multicone for $(A_{1},\ldots,A_{N})$ with transverse-defining vector $w\in\mathbb{R}^{d}$ . Then there exists a nonempty bounded open subset $\Omega$ of the complex hyperplane $\{z\in\mathbb{C}^{d}\colon\langle z,w\rangle=1\}$ such that the following properties hold. Let $\mathcal{A}^{2}(\Omega)$ denote the separable complex Hilbert space of holomorphic functions $\Omega\to\mathbb{C}$ for which the integral $\int_{\Omega}|f(z)|^{2}dV(z)$ is finite, where $V$ denotes $2(d-1)$ -dimensional Lebesgue measure on $\Omega$ . For each $p\in\mathbb{C}$ define an operator $\mathscr{L}_{p}\colon\mathcal{A}^{2}(\Omega)\to\mathcal{A}^{2}(\Omega)$ by

[TABLE]

Then the operators $\mathscr{L}_{p}$ are well-defined bounded linear operators on $\mathcal{A}^{2}(\Omega)$ and:

(i)

There exist $C,\kappa,\gamma>0$ such that for all $p\in\mathbb{C}$ and $n\geq 1$ we have

[TABLE]

In particular each $\mathscr{L}_{p}$ is trace-class. 2. (ii)

For every $p\in\mathbb{C}$ and $n\geq 1$ we have

[TABLE] 3. (iii)

For every $p\in\mathbb{R}$ the spectral radius of $\mathscr{L}_{p}$ is equal to

[TABLE] 4. (iv)

For all $p\in\mathbb{R}$ the spectral radius of $\mathscr{L}_{p}$ is a simple eigenvalue of $\mathscr{L}_{p}$ and there are no other eigenvalues of the same modulus.

Theorem 1 can thus be seen as a version of the eigenvalue-problem approach discussed in the previous section, but one which takes advantage of the special additional structure of trace-class operators. Note that since trace-class operators are compact operators they are very far from being invertible, and indeed an important feature of the hypotheses of Theorem 2 is that the transformations $A_{i}$ map a (not necessarily connected) patch of $\mathbb{RP}^{d-1}$ strictly inside itself – which results in a non-invertible action on the associated function space – as opposed to acting transitively on $\mathbb{RP}^{d-1}$ . This feature is precisely the content of the multicone hypothesis, and indeed the non-invertibility of the action on $\mathbb{RP}^{d-1}$ is critical in constructing a space on which the operators $\mathscr{L}_{p}$ can act in a trace-class manner. As such any extension of the method of Theorem 1 to families of matrices with non-real eigenvalues is therefore likely to be impossible since such matrices would tend to act transitively on the phase space $\mathbb{RP}^{d-1}$ , preventing the construction of a suitable domain for a trace-class operator to act upon.

5. Proof of Theorem 1

The following result summarises the classical results on traces and determinants of trace-class operators on Hilbert spaces which will be required in our proof. It is a combination of several results from [38, §3], with the exception of the determinant formula for $a_{n}$ which may be found instead in, for example, [37, Theorem 6.8] or [14, Theorem IV.5.2].

Theorem 3.

Let $\mathscr{H}$ be a complex separable Hilbert space, let $\mathscr{L}$ be a trace-class operator acting on $\mathscr{H}$ , and define $a_{0}:=1$ and

[TABLE]

for every $n\geq 1$ . Then the power series $\mathscr{D}(z):=\sum_{n=0}^{\infty}a_{n}z^{n}$ converges for all $z\in\mathbb{C}$ . The function $\mathscr{D}\colon\mathbb{C}\to\mathbb{C}$ is holomorphic, the zeros of $\mathscr{D}$ are precisely the reciprocals of the nonzero eigenvalues of $\mathscr{L}$ , and the degree of each zero of $\mathscr{D}$ is equal to the algebraic multiplicity of the corresponding eigenvalue of $\mathscr{L}$ . Moreover the coefficients $a_{n}$ satisfy the estimate

[TABLE]

for every $n\geq 1$ .

We also require the following elementary lemma:

Lemma 5.1.

For each $\gamma,\alpha>0$ there exists a constant $K=K(\alpha,\gamma)>0$ such that

[TABLE]

for all $m\geq 1$ .

Proof.

Fix $\gamma$ and $\alpha$ . By adjusting the constant $K$ if necessary we may without loss of generality assume $m\geq 2$ . Define

[TABLE]

Since clearly $e^{-\gamma n^{\alpha}}\leq\int_{n-1}^{n}e^{-\gamma t^{\alpha}}dt$ for every integer $n$ we have

[TABLE]

for every $m\geq 2$ and the result follows. ∎

We may now begin the proof of Theorem 1. Fix $A_{1},\ldots,A_{N}$ and $p\in\mathbb{R}$ as in Theorem 1. By Theorem 2 there exist a complex separable Hilbert space $\mathscr{H}$ and a trace-class linear operator $\mathscr{L}_{p}\colon\mathscr{H}\to\mathscr{H}$ such that $\varrho_{p}(A_{1},\ldots,A_{N})$ is a simple isolated eigenvalue of $\mathscr{L}_{p}$ , such that all other eigenvalues have absolute value strictly smaller than $\varrho_{p}(A_{1},\ldots,A_{N})$ , such that

[TABLE]

for every $n\geq 1$ and such that there exist constants $C_{1},\gamma_{1}>0$ such that $\mathfrak{s}_{n}(\mathscr{L}_{p})\leq C_{1}\exp(-\gamma_{1}n^{\frac{1}{d-1}})$ for every $n\geq 1$ . Define the sequence $(t_{n})$ in accordance with Theorem 1 and note that we have $t_{n}=\operatorname{tr}\mathscr{L}^{n}_{p}$ for every $n\geq 1$ . For each $n\geq 0$ let $a_{n}$ be as defined in Theorem 3 and note that this coincides with the definition of the sequence $a_{n}$ in Theorem 1. We claim that there exist $C_{2},\gamma_{2}>0$ such that

[TABLE]

for every $n\geq 1$ . To see this let $n\geq 1$ and observe that by Theorem 3

[TABLE]

where we have used Lemma 5.1 with $\alpha=\frac{1}{d-1}$ and have also used the elementary inequality

[TABLE]

which is valid since the series is an upper Riemann sum of the integral. The claim follows easily.

Now define a function $\mathscr{D}\colon\mathbb{C}\to\mathbb{C}$ by $\mathscr{D}(z):=\sum_{n=0}^{\infty}a_{n}z^{n}$ . It is clear from the estimate (11) that this power series has infinite radius of convergence and therefore $\mathscr{D}$ is a well-defined holomorphic function on $\mathbb{C}$ . By Theorem 3 we have $\mathscr{D}(z)=\det(I-z\mathscr{L}_{p})$ for all $z\in\mathbb{C}$ and the zeros of $\mathscr{D}$ are precisely the reciprocals of the nonzero eigenvalues of $\mathscr{L}_{p}$ with the degree of each zero being equal to the algebraic multiplicity of the corresponding eigenvalue. By Theorem 2, $\varrho_{p}(A_{1},\ldots,A_{N})$ is the largest eigenvalue of $\mathscr{L}_{p}$ in absolute value and is a simple eigenvalue. It follows that we may choose a circular contour $\Gamma$ in $\mathbb{C}$ which is centred somewhere on the real line, passes through [math], encloses $1/\varrho_{p}(A_{1},\ldots,A_{N})$ and does not enclose or intersect the reciprocal of any eigenvalue of $\mathscr{L}_{p}$ other than $\varrho_{p}(A_{1},\ldots,A_{N})$ . Let $c\in\mathbb{R}$ and $R>0$ denote the centre point and radius of $\Gamma$ respectively. Since $\Gamma$ does not intersect the reciprocal of any eigenvalue of $\mathscr{L}_{p}$ the function $\mathscr{D}$ does not have any zeros on $\Gamma$ , so by compactness

[TABLE]

For each $n\geq 1$ define a function $\mathscr{D}_{n}\colon\mathbb{C}\to\mathbb{C}$ by $\mathscr{D}_{n}(z):=\sum_{k=0}^{n}a_{k}z^{k}$ . Obviously each $\mathscr{D}_{n}$ is a polynomial and is therefore holomorphic on $\mathbb{C}$ . Via Lemma 5.1 the estimate (11) implies

[TABLE]

for all $n\geq 1$ and some suitable constants $C_{3},C_{4},\gamma_{3},\gamma_{4}>0$ . In particular

[TABLE]

and therefore there exists $n_{0}\geq 1$ such that for all $n\geq n_{0}$

[TABLE]

Applying Rouché’s theorem on the circular contour $\Gamma$ we deduce that for all $n\geq n_{0}$ the functions $\mathscr{D}$ and $\mathscr{D}_{n}$ have the same number of zeros inside the contour $\Gamma$ , and the total degree of the zeros inside $\Gamma$ is the same for the function $\mathscr{D}$ as it is for the function $\mathscr{D}_{n}$ . Since $\mathscr{D}$ has a unique zero inside $\Gamma$ and that zero is simple this means that $\mathscr{D}_{n}$ has a unique zero inside $\Gamma$ for all large enough $n$ , and this zero is simple. Call this zero $r_{n}$ . Since $\mathscr{D}_{n}$ is a polynomial with real coefficients its zeros are symmetrically located with respect to reflection in the real axis. Since the contour $\Gamma$ is circular with real centre, a zero of $\mathscr{D}_{n}$ is enclosed by $\Gamma$ if and only if the complex conjugate of that zero is also so enclosed. It follows that the complex conjugate of $r_{n}$ is also enclosed by the contour $\Gamma$ and is therefore also a zero of $\mathscr{D}_{n}$ . But $\mathscr{D}_{n}$ has a unique zero inside $\Gamma$ . These statements can only be compatible if $r_{n}$ is equal to its own complex conjugate, and we conclude that $r_{n}$ is real. Since $r_{n}$ is enclosed by $\Gamma$ and is real it necessarily lies on the interval $(0,2R)$ and is the unique zero of $\mathscr{D}_{n}$ on that interval. In particular it is the smallest positive zero of the polynomial $\mathscr{D}_{n}$ .

Define $r_{\infty}:=1/\varrho_{p}(A_{1},\ldots,A_{N})\in(0,2R)$ . To complete the proof of the theorem we will show that

[TABLE]

We first require a lower bound for the derivative $\mathscr{D}^{\prime}(z)$ for $z$ close to $r_{\infty}$ . Since $r_{\infty}=1/\varrho_{p}(A_{1},\ldots,A_{N})$ is a simple zero of $\mathscr{D}$ we have $\mathscr{D}^{\prime}(r_{\infty})\neq 0$ , and since it is also necessarily an isolated zero we may choose $\delta>0$ such that $|\mathscr{D}^{\prime}(z)|\neq 0$ for all $z\in\mathbb{C}$ with $|z-r_{\infty}|\leq\delta$ , such that $\mathscr{D}(z)\neq 0$ for all $z\in\mathbb{C}$ with $0<|z-r_{\infty}|\leq\delta$ , and such that the closed disc of radius $\delta$ and centre $r_{\infty}$ is enclosed by the contour $\Gamma$ . Since by compactness

[TABLE]

it follows via (13) in the same manner as before that there exists $n_{1}\geq n_{0}$ such that for all $n\geq n_{1}$

[TABLE]

Applying Rouché’s theorem again, this time to the circular contour with centre $r_{\infty}$ and radius $\delta$ , we see that for each $n\geq n_{1}$ there is a unique zero of $\mathscr{D}_{n}$ within distance $\delta$ of $r_{\infty}$ . Since the disc of radius $\delta$ and centre $r_{\infty}$ is enclosed by $\Gamma$ , and $\Gamma$ encloses a unique zero of $\mathscr{D}_{n}$ , we conclude that this zero must be $r_{n}$ and therefore $|r_{n}-r_{\infty}|<\delta$ for all $n\geq n_{1}$ .

Now define

[TABLE]

Since $\mathscr{D}_{n}$ is a polynomial with real coefficients it takes only real values when restricted to $\mathbb{R}$ and therefore the same is true of $\mathscr{D}$ since it is the pointwise limit of $\mathscr{D}_{n}$ as $n\to\infty$ . Let $n\geq n_{1}$ and suppose that $r_{n}\neq r_{\infty}$ . By the Mean Value Theorem it follows that there exists a real number $t$ in the interval from $r_{n}$ to $r_{\infty}$ such that

[TABLE]

Since clearly $|r_{\infty}-t|\leq|r_{\infty}-r_{n}|\leq\delta$ we have $|\mathscr{D}^{\prime}(t)|\geq\kappa$ and therefore

[TABLE]

This inequality is obviously also true for integers $n\geq n_{1}$ such that $r_{n}=r_{\infty}$ . In particular for all $n\geq n_{1}$ we have

[TABLE]

using the fact that $\mathscr{D}_{n}(r_{n})=0=\mathscr{D}(r_{\infty})$ . Thus

[TABLE]

for all $n\geq n_{1}$ using (12). We in particular have $\lim_{n\to\infty}r_{n}=r_{\infty}$ . If $n_{2}\geq n_{1}$ is taken large enough that for all $n\geq n_{2}$ we have $r_{n}\geq\frac{1}{2}r_{\infty}$ , then for all $n\geq n_{2}$ we have

[TABLE]

and this completes the proof of the theorem.

6. Example: a pair of matrices considered by Jungers and Protasov

In the article [23] the $p$ -radius of the pair $(A_{1},A_{2})$ defined by

[TABLE]

was investigated motivated by its connection with Chaikin’s subdivision schemes and the $L^{p}$ regularity of refinable functions. The reader may easily check that if we define

[TABLE]

then the matrices $X^{-1}A_{1}X$ and $X^{-1}A_{2}X$ are both positive, so the pair $(A_{1},A_{2})$ strictly preserves a cone and Theorem 1 may be applied thereto. The results of applying the various methods of estimation to $\varrho_{3.5}(A_{1},A_{2})$ are tabulated in Figures 1–5 below. The reader will notice that by far the best results are those obtained by Theorem 1: the estimate obtained by evaluating all products $A_{\mathtt{i}}$ of length up to 12 yields the estimate $0.19773298680753190957\ldots$ which is empirically accurate to all decimal places shown. Estimates of comparable complexity using the method of §3.3 give only the first two decimal places, albeit rigorously; the naïve upper and lower estimates described in §3.1 are not even sufficient to establish the first significant digit of $\varrho_{3.5}(A_{1},A_{2})$ . The methods of §3.2 and §3.4 perform somewhat better, being able to give non-rigorous estimates accurate to several decimal places. We also observe that the upper estimate arising from logarithmic convexity,

[TABLE]

gives a rigorous upper bound of

[TABLE]

which, remarkably, is more accurate than several of the other methods employed. Applying Theorem 1 with $n=20$ gives the estimate

[TABLE]

which is empirically accurate to all decimal places shown and provides the value of the estimate (4) mentioned in the introduction.

7. Conclusions

We have introduced a new method for estimating the $p$ -radius of low-cardinality sets of positive or dominated matrices and investigated its effectiveness in the case of a particular pair of matrices considered by Jungers and Protasov in connection with applications to Chaikin’s subdivision scheme. We have compared its results to those of a number of other estimation methods in the case of that example and obtained results apparently accurate to within an absolute error of approximately $10^{-20}$ , versus approximately $10^{-2}$ to $10^{-6}$ for rival methods.

The new method has the disadvantage that the number of matrix products which must be computed in order to obtain the $n^{\mathrm{th}}$ approximation to $\varrho_{p}(A_{1},\ldots,A_{N})$ grows approximately as $N^{n}$ . In particular if the number of matrices $N$ being considered is greater than around 4, the computational burden of producing accurate results may be prohibitively large. This disadvantage is however shared by the methods of §3.1 and §3.3. In view of this consideration, when $N$ is large the methods of §3.2 and §3.4 may be preferable. Our method also, as presently formulated, does not provide a rigorous estimate of its own accuracy, and if rigorous bounds are sought then the method of §3.3, possibly in combination with the logarithmic-convexity bound (9) may be applied instead. For two-dimensional positive matrices it seems likely that an effective bound on the error $|\varrho_{p}(A_{1},\ldots,A_{N})-1/r_{n}|$ could be given by adapting the arguments of [20, 24], but in higher dimensions this would require new technical results in order to bound the cardinality of the relative covers arising in the application of [3, Theorem 4.7] to the action of real linear maps on projective slices of complex cones.

8. Acknowledgements

This research was supported by the Leverhulme Trust (Research Project Grant number RPG-2016-194).

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Avila, A., Bochi, J., and Yoccoz, J.-C. Uniformly hyperbolic finite-valued SL ( 2 , ℝ ) SL 2 ℝ {\rm SL}(2,\mathbb{R}) -cocycles. Comment. Math. Helv. 85 , 4 (2010), 813–884.
2[2] Bai, Z.-Q. On the cycle expansion for the Lyapunov exponent of a product of random matrices. J. Phys. A 40 , 29 (2007), 8315–8328.
3[3] Bandtlow, O. F., and Jenkinson, O. Explicit eigenvalue estimates for transfer operators acting on spaces of holomorphic functions. Adv. Math. 218 , 3 (2008), 902–925.
4[4] Barnsley, M. F., and Vince, A. Real projective iterated function systems. J. Geom. Anal. 22 , 4 (2012), 1137–1172.
5[5] Bochi, J., and Gourmelon, N. Some characterizations of domination. Math. Z. 263 , 1 (2009), 221–231.
6[6] Bochi, J., and Morris, I. D. Continuity properties of the lower spectral radius. Proc. Lond. Math. Soc. (3) 110 , 2 (2015), 477–509.
7[7] Brundu, M., and Zennaro, M. Invariant multicones for families of matrices. Ann. Mat. Pura Appl. (4) . To appear.
8[8] Cabrelli, C. A., Heil, C., and Molter, U. M. Self-similarity and multiwavelets in higher dimensions. Mem. Amer. Math. Soc. 170 , 807 (2004), viii+82.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Fast approximation of the ppp-radius, matrix pressure or generalised Lyapunov exponent for positive and dominated matrices

Abstract.

1. Introduction

2. Statement of main result

Theorem 1**.**

3. Methods for estimating the ppp-radius

3.1. Fundamental estimates

3.2. Resampled Monte Carlo methods

3.3. The convex optimisation bounds of Jungers and Protasov

3.4. Eigenvalue methods

4. Overview of the proof of Theorem 1

Theorem 2** ([29, Corollary 5.1]).**

5. Proof of Theorem 1

Theorem 3**.**

Lemma 5.1**.**

Proof.

6. Example: a pair of matrices considered by Jungers and Protasov

7. Conclusions

8. Acknowledgements

Fast approximation of the $p$ -radius, matrix pressure or generalised Lyapunov exponent for positive and dominated matrices

Theorem 1.

3. Methods for estimating the $p$ -radius

Theorem 2 ([29, Corollary 5.1]).

Theorem 3.

Lemma 5.1.