Characteristic polynomials of p-adic matrices

Xavier Caruso (IRMAR); David Roe; Tristan Vaccon (XLIM-MATHIS; UNILIM)

arXiv:1702.01653·math.NT·February 7, 2017

Characteristic polynomials of p-adic matrices

Xavier Caruso (IRMAR), David Roe, Tristan Vaccon (XLIM-MATHIS, UNILIM)

PDF

Open Access

TL;DR

This paper investigates the precision of characteristic polynomials of p-adic matrices, providing criteria and algorithms to determine and optimize their precision based on differential methods.

Contribution

It introduces a checkable criterion for the exact precision of characteristic polynomials and an efficient algorithm to determine optimal precision when the criterion fails.

Findings

01

A criterion for the exact precision of characteristic polynomials in p-adic matrices.

02

An O~(n^3) algorithm for computing optimal precision.

03

Examples showing cases of higher-than-expected precision.

Abstract

We analyze the precision of the characteristic polynomial of an $n \times n$ p-adic matrix A using differential precision methods developed previously. When A is integral with precision O(p^N), we give a criterion (checkable in time O~(n^omega)) for $χ$ (A) to have precision exactly O(p^N). We also give a O~(n^3) algorithm for determining the optimal precision when the criterion is not satisfied, and give examples when the precision is larger than O(p^N).

Equations51

d χ_{M} : d M \mapsto Tr (Com (X - M) \cdot d M) .

d χ_{M} : d M \mapsto Tr (Com (X - M) \cdot d M) .

α = min (i = 1 \prod n - 1 ∣ σ_{i} (M) ∣, 1) .

α = min (i = 1 \prod n - 1 ∣ σ_{i} (M) ∣, 1) .

χ (M + H) = χ (M) + d χ_{M} (H) .

χ (M + H) = χ (M) + d χ_{M} (H) .

C = 00 ⋮ 0 - a_{0} 10 ⋮ 0 - a_{1} 01 ⋱ \dots \dots \dots ⋱ ⋱ 0 \dots 0 ⋮ 01 - a_{n - 1}

C = 00 ⋮ 0 - a_{0} 10 ⋮ 0 - a_{1} 01 ⋱ \dots \dots \dots ⋱ ⋱ 0 \dots 0 ⋮ 01 - a_{n - 1}

Com (X - M) = α \cdot P V^{t} \cdot V Q^{t} mod χ_{M}

Com (X - M) = α \cdot P V^{t} \cdot V Q^{t} mod χ_{M}

M_{i,j}=0\mbox{ for $j\leq i-2$.}

M_{i,j}=0\mbox{ for $j\leq i-2$.}

M_{i,j}=O(\pi^{n_{i,j}})\mbox{ for $j\leq i-2$.}

M_{i,j}=O(\pi^{n_{i,j}})\mbox{ for $j\leq i-2$.}

Com (X - M_{1}) = P Com (X - M_{2}) P^{- 1} .

Com (X - M_{1}) = P Com (X - M_{2}) P^{- 1} .

Com (1 - X M)^{rec, n - 1} =

Com (1 - X M)^{rec, n - 1} =

(χ_{M} I_{n})^{rec, n} =

d χ_{M} (d M)

d χ_{M} (d M)

Com (X - M)

C^{i} e = (0, \dots, 0, - a_{0}, ⋆, \dots, ⋆)

C^{i} e = (0, \dots, 0, - a_{0}, ⋆, \dots, ⋆)

Com (X - C) = α \cdot V^{t} \cdot V R^{t} mod χ_{M}

Com (X - C) = α \cdot V^{t} \cdot V R^{t} mod χ_{M}

α = a_{1} + a_{2} X + \dots + a_{n - 1} X^{n - 2} + X^{n - 1} .

α = a_{1} + a_{2} X + \dots + a_{n - 1} X^{n - 2} + X^{n - 1} .

N_{k}^{'} = 1 \leq i, j \leq n min N_{j, i} + val (π_{k} (C_{i, j})),

N_{k}^{'} = 1 \leq i, j \leq n min N_{j, i} + val (π_{k} (C_{i, j})),

Tr (Com (X - M) \cdot d M) = Tr (Com (X - H) \cdot P^{- 1} d M P)

Tr (Com (X - M) \cdot d M) = Tr (Com (X - H) \cdot P^{- 1} d M P)

x=p^{v}\cdot\big{(}a+O\big{(}p^{N+v_{p}(a)}\big{)}\big{)}

x=p^{v}\cdot\big{(}a+O\big{(}p^{N+v_{p}(a)}\big{)}\big{)}

P [v = 0] = \frac{1}{5}; P [v = n] = \frac{2}{5 \cdot ∣ n ∣ \cdot ( ∣ n ∣ + 1 )} for ∣ n ∣ \geq 1

P [v = 0] = \frac{1}{5}; P [v = n] = \frac{2}{5 \cdot ∣ n ∣ \cdot ( ∣ n ∣ + 1 )} for ∣ n ∣ \geq 1

d M \mapsto d λ = - \frac{Tr ( Com ( λ - M ) \cdot d M )}{χ _{M}^{'} ( λ )}

d M \mapsto d λ = - \frac{Tr ( Com ( λ - M ) \cdot d M )}{χ _{M}^{'} ( λ )}

N^{\prime}=\min_{1\leq i,j\leq n}\big{(}N_{j,i}+\operatorname{val}(C_{i,j}(\lambda))-\operatorname{val}(\chi^{\prime}_{M}(\lambda))\big{)}

N^{\prime}=\min_{1\leq i,j\leq n}\big{(}N_{j,i}+\operatorname{val}(C_{i,j}(\lambda))-\operatorname{val}(\chi^{\prime}_{M}(\lambda))\big{)}

N^{'}

N^{'}

\displaystyle\hskip 5.69054pt+\min_{1\leq i,j\leq n}\big{(}N_{j,i}+\operatorname{val}(P_{i}V(\lambda)^{t})+\operatorname{val}(V(\lambda)Q_{j}^{t})\big{)}

N^{'}

N^{'}

\displaystyle\hskip 5.69054pt+\min_{1\leq i\leq n}\operatorname{val}(P_{i}V(\lambda)^{t})+\min_{1\leq j\leq n}\operatorname{val}(V(\lambda)Q_{j}^{t})\big{)}

N_{k}^{'}

N_{k}^{'}

\displaystyle\hskip 5.69054pt+\min_{1\leq i,j\leq n}\big{(}N_{j,i}+\operatorname{val}(P_{i}V(\lambda_{k})^{t})+\operatorname{val}(V(\lambda_{k})Q_{j}^{t})\big{)}.

P \cdot λ_{1} λ_{1}^{2} ⋮ λ_{1}^{n - 1} \dots \dots \dots λ_{s} λ_{s}^{2} ⋮ λ_{s}^{n - 1} .

P \cdot λ_{1} λ_{1}^{2} ⋮ λ_{1}^{n - 1} \dots \dots \dots λ_{s} λ_{s}^{2} ⋮ λ_{s}^{n - 1} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsadvanced mathematical theories · Polynomial and algebraic computation · Iterative Methods for Nonlinear Equations

Full text

\setcopyright

acmcopyright

\isbn***--*-**-0/17/07\acmPrice$15.00

\permissionPublication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Characteristic polynomials of p-adic matrices

Xavier Caruso

David Roe

Tristan Vaccon

Université Rennes 1

[email protected]

University of Pittsburg

[email protected]

Université de Limoges

[email protected]

(2017)

Abstract

We analyze the precision of the characteristic polynomial $\chi(A)$ of an $n\times n$ $p$ -adic matrix $A$ using differential precision methods developed previously. When $A$ is integral with precision $O(p^{N})$ , we give a criterion (checkable in time $O\tilde{~{}}(n^{\omega})$ ) for $\chi(A)$ to have precision exactly $O(p^{N})$ . We also give a $O\tilde{~{}}(n^{3})$ algorithm for determining the optimal precision when the criterion is not satisfied, and give examples when the precision is larger than $O(p^{N})$ .

doi:

http://dx.doi.org/10.1145/.

keywords:

Algorithms, $p$ -adic precision, characteristic polynomial, eigenvalue

††conference: ISSAC ’17, July 25-28, 2017, Kaiserslautern, Germany

{CCSXML}

<ccs2012> <concept> <concept_id>10010147.10010148.10010149.10010150</concept_id> <concept_desc>Computing methodologies Algebraic algorithms</concept_desc> <concept_significance>500</concept_significance> </concept> </ccs2012>

\ccsdesc

[500]Computing methodologies Algebraic algorithms \printccsdesc

1 Introduction

The characteristic polynomial is a fundamental invariant of a matrix: its roots give the eigenvalues, and the trace and determinant can be extracted from its coefficients. In fact, the best known division-free algorithm for computing determinants over arbitrary rings [kaltofen:92a] does so using the characteristic polynomial. Over $p$ -adic fields, computing the characteristic polynomial is a key ingredient in algorithms for counting points of varieties over finite fields (see [kedlaya:01a, harvey:07a, harvey:14a].

When computing with $p$ -adic matrices, the lack of infinite memory implies that the entries may only be approximated at some finite precision $O(p^{N})$ . As a consequence, in designing algorithms for such matrices one must analyze not only the running time of the algorithm but also the accuracy of the result.

Let $M\in M_{n}(\mathbb{Q}_{p})$ be known at precision $O(p^{N})$ . The simplest approach for computing the characteristic polynomial of $M$ is to compute $\det(X-M)$ either using recursive row expansion or various division free algorithms \citelist[seifullin:02a] [kaltofen:92a]. There are two issues with these methods. First, they are slower than alternatives that allow division, requiring $O(n!)$ , $O(n^{4})$ and $O\tilde{~{}}(n^{2+\omega/2})$ operations. Second, while the lack of division implies that the result is accurate modulo $p^{N}$ as long as $M\in M_{n}(\mathbb{Z}_{p})$ , they still do not yield the optimal precision.

A faster approach over a field is to compute the Frobenius normal form of $M$ , which is achievable in running time $O\tilde{~{}}(n^{\omega})$ [storjohann:01a]. However, the fact that it uses division frequently leads to catastrophic losses of precision. In many examples, no precision remains at the end of the calculation.

Instead, we separate the computation of the precision of $\chi_{M}$ from the computation of an approximation to $\chi_{M}$ . Given some precision on $M$ , we use [caruso-roe-vaccon:14a]*Lem. 3.4 to find the best possible precision for $\chi_{M}$ . The analysis of this precision is the subject of much of this paper. With this precision known, the actual calculation of $\chi_{M}$ may proceed by lifting $M$ to a temporarily higher precision and then using a sufficiently stable algorithm (see Remark 5.3).

One benefit of this approach is that we may account for diffuse precision: precision that is not localized on any single coefficient of $\chi_{M}$ . For example, let $0\leq\alpha_{1}\leq\alpha_{2}\leq\dots\leq\alpha_{n}$ , consider a diagonal matrix $D$ with diagonal entries $(p^{\alpha_{1}},\dots,p^{\alpha_{n}})$ , let $P,Q\in\operatorname{GL}_{n}(\mathbb{Z}_{p})$ and set $M=PDQ$ . The valuation of the coefficient of $X^{n-k}$ in $\chi_{M}$ will be $\sum_{i=1}^{k}\alpha_{i}$ , and if $\alpha_{n-1}>0$ and $M$ is known with precision $O(p^{N})$ then the constant term of $\chi_{M}$ will be known with precision larger than $O(p^{N})$ (see [caruso-roe-vaccon:15a]*Prop. 3.2).

As long as none of the eigenvalues of $M$ are congruent to $-1$ modulo $p$ , then none of the coefficients of the characteristic polynomial of $1+M$ will have precision larger than $O(p^{N})$ . But $\chi_{1+M}(X)=\chi_{M}(X-1)$ , so the precision content of these two polynomials should be equivalent. The solution is that the extra precision in $\chi_{1+M}$ is diffuse and not visible on any individual coefficient. We formalize this phenomenon using lattices; see Section 2.1 for further explanation, and [caruso:17a]*§3.2.2 for a specific example of the relationship between $\chi_{M}$ and $\chi_{1+M}$ .

Previous contributions.

Since the description of Kedlaya’s algorithm in [kedlaya:01a], the computation of characteristic polynomials over $p$ -adic numbers has become a crucial ingredient in many counting-points algorithms. For example, [harvey:07a, harvey:14a] use $p$ -adic cohomology and the characteristic polynomial of Frobenius to compute zeta functions of hyperelliptic curves.

In most of these papers, the precision analysis usually deals with great details on how to obtain the matrices (e.g. of action of Frobenius) that are involved in the point-counting schemes. However, the computation of their characteristic polynomials is often a little bit less thoroughly studied: some refer to fast algorithms (using division), while others apply division-free algorithms.

In [caruso-roe-vaccon:15a], the authors have begun the application of the theory of differential precision of [caruso-roe-vaccon:14a] to the stable computation of characteristic polynomials. They have obtained a way to express the optimal precision on the characteristic polynomial, but have not given practical algorithms to attain this optimal precision.

The contribution of this paper.

Thanks to the application the framework of differential precision of [caruso-roe-vaccon:14a] in [caruso-roe-vaccon:15a], we know that the precision of the characteristic polynomial $\chi_{M}$ of a matrix $M\in M_{n}(\mathbb{Q}_{p})$ is determined by the comatrix $\operatorname{Com}(X-M).$ In this article, we provide:

Proposition 2.7: a factorization of $\operatorname{Com}(X-M)$ as a product of two rank- $1$ matrices (when $M$ has a cyclic vector), computable in $O\tilde{~{}}(n^{\omega})$ operations by Theorem 4.1 2. 2.

Corollary 2.4: a simple, $O\tilde{~{}}(n^{\omega})$ criterion to decide whether $\chi_{M}$ is defined at precision higher than the precision of $M$ (when $M\in M_{n}(\mathbb{Z}_{p})$ ). 3. 3.

Theorem 3.11: a $O\tilde{~{}}(n^{3})$ algorithm with operations in $\mathbb{Z}_{p}$ to compute the optimal precision on each coefficient of $\chi_{M}$ (when $M$ is given with uniform precision on its entries). 4. 4.

Proposition 5.6: a $O\tilde{~{}}(n^{\omega})$ algorithm to compute the optimal precision on each eigenvalue of $M$

Organization of the article.

In Section 2, we review the differential theory of precision developed in [caruso-roe-vaccon:14a] and apply it to the specific case of the characteristic polynomial, giving conditions under which the differential will be surjective (and thus provide a good measure of precision). We also give a condition based on reduction modulo $p$ that determines whether the characteristic polynomial will have a higher precision that the input matrix, and show that the image of the set of integral matrices has the structure of an $\mathcal{O}_{K}[X]$ -module when $M$ is itself integral. Finally, we give a compact description of $\operatorname{Com}(X-M)$ , the main ingredient in the differential.

In Section 3, we develop $O\tilde{~{}}(n^{3})$ algorithms to approximate the Hessenberg form of $M$ , and through it to find $\operatorname{Com}(X-M)$ and thus find the precision of the characteristic polynomial of $M$ . In Section 4, we give a $O\tilde{~{}}(n^{\omega})$ algorithm to compute the compact description of $\operatorname{Com}(X-M)$ .

Finally, we propose in Section 5 algorithms to compute the optimal coefficient-wise precision for the characteristic polynomial. We also give the results of some experiments demonstrating that these methods can lead to dramatic gains in precision over standard interval arithmetic. We close with results describing the precision associated to eigenvalues of a matrix.

Notation

Throughout the paper, $K$ will refer to a complete, discrete valuation field, $\operatorname{val}:K\twoheadrightarrow\mathbb{Z}\cup\{+\infty\}$ to its valuation, $\mathcal{O}_{K}$ its ring of integers and $\pi$ a uniformizer. We will write that $f(n)=O\tilde{~{}}(g(n))$ if there exists some $k\in\mathbb{N}$ such that $f(n)=O(g(n)\log(n)^{k}).$ We will write $M$ for an $n\times n$ matrix over $K$ , and $\chi$ the characteristic polynomial map, $\chi_{M}\in K[X]$ for the characteristic polynomial of $M$ and $d\chi_{M}$ for the differential of $\chi$ at $M$ , as a linear map from $M_{n}(K)$ to the space of polynomials of degree less than $n$ . We fix an $\omega\in\mathbb{R}$ such that the multiplication of two matrices over a ring is in $O(n^{\omega})$ operations in the ring. Currently, the smallest known $\omega$ is less than $2.3728639$ thanks to [legall:14a]. We will denote by $I_{n}$ the identity matrix of rank $n$ in $M_{n}(K).$ When there is no ambiguity, we will drop this $I_{n}$ for scalar matrices, e.g. for $\lambda\in K$ and $M\in M_{n}(K),$ $\lambda-M$ denotes $\lambda I_{n}-M.$ Finally, we write $\sigma_{1}(M),\dots,\sigma_{n}(M)$ for the elementary divisors of $M$ , sorted in increasing order of valuation.

2 Theoretical study

2.1 The theory of p-adic precision

We recall some of the definitions and results of [caruso-roe-vaccon:14a] as a foundation for our discussion of the precision for the characteristic polynomial of a matrix. We will be concerned with two $K$ -manifolds in what follows: the space $M_{n}(K)$ of $n\times n$ matrices with entries in $K$ and the space $K_{n}[X]$ of monic degree $n$ polynomials over $K$ . Given a matrix $M\in M_{n}(K)$ , the most general kind of precision structure we may attach to $M$ is a lattice $H$ in the tangent space at $M$ . However, representing an arbitrary lattice requires $n^{2}$ basis vectors, each with $n^{2}$ entries. We therefore frequently work with certain classes of lattices, either jagged lattices where we specify a precision for each matrix entry or flat lattices where every entry is known to a fixed precision $O(p^{N})$ . Similarly, precision for monic polynomials can be specified by giving a lattice in the tangent space at $f(X)\in K_{n}[X]$ , or restricted to jagged or flat precision in the interest of simplicity.

Let $\chi:M_{n}(K)\to K_{n}[X]$ be the characteristic polynomial map. Our analysis of the precision behavior of $\chi$ rests upon the computation of its derivative $d\chi$ , using [caruso-roe-vaccon:14a]*Lem. 3.4. For a matrix $M\in M_{n}(K)$ , we identify the tangent space $V$ at $M$ with $M_{n}(K)$ itself, and the tangent space $W$ at $\chi_{M}$ with the space $K_{<n}[X]$ of polynomials of degree less than $n$ . Let $\operatorname{Com}(M)$ denote the comatrix of $M$ (when $M\in\operatorname{GL}_{n}(K)$ , we have $\operatorname{Com}(M)=\det(M)M^{-1}$ ) and $d\chi_{M}$ the differential at $M$ . Recall \citelist[caruso-roe-vaccon:14a]Appendix B [caruso-roe-vaccon:15a]§3.3 that $d\chi_{M}$ is given by

[TABLE]

Proposition 2.1.

For $M\in M_{n}(K)$ , the following conditions are equivalent:

(i)

the differential $d\chi_{M}$ is surjective 2. (ii)

the matrix $M$ has a cyclic vector (i.e. $M$ is similar to a companion matrix) 3. (iii)

the eigenspaces of $M$ over the algebraic closure $\bar{K}$ of $K$ all have dimension $1$ 4. (iv)

the characteristic polynomial of $M$ is equal to the minimal polynomial of $M$ .

Proof.

The equivalence of (ii), (iii), and (iv) is standard; see [hoffman-kunze:LinearAlgebra]*§7.1 for example. We now show (ii) $\Rightarrow$ (i) and (i) $\Rightarrow$ (iii)

For any $A\in\operatorname{GL}_{n}(K)$ , the image of $d\chi$ at $M$ will be the same as the image of $d\chi$ at $AMA^{-1}$ , so we may assume that $M$ is a companion matrix. For a companion matrix, the bottom row of $\operatorname{Com}(X-M)$ consists of $1,X,X^{2},\dots,X^{n-1}$ so $d\chi_{M}$ is surjective.

Now suppose that $M$ has a repeated eigenvalue $\lambda$ over $\bar{K}$ . After conjugating into Jordan normal form over $\bar{K}$ , the entries of $\operatorname{Com}(X-M)$ will also be block diagonal, and divisible within each block by the product of $(X-\mu)^{d_{\mu}}$ , where $\mu,d_{\mu}$ ranges over the eigenvalues and dimensions of the other Jordan blocks. Since $\lambda$ occurs in two Jordan blocks, $X-\lambda$ will divide every entry of $\operatorname{Com}(X-M)$ and $d\chi_{M}$ will not be surjective. ∎

We also have an analogue of Proposition 2.1 for integral matrices.

Proposition 2.2.

For $M\in M_{n}(\mathcal{O}_{K})$ , the following conditions are equivalent:

(i)

the image of $M_{n}(\mathcal{O}_{K})$ under $d\chi_{M}$ is $\mathcal{O}_{K}[X]\cap K_{<n}[X]$ . 2. (ii)

the reduction of $M$ modulo $\pi$ has a cyclic vector.

Proof.

The condition (i) is equivalent to the surjectivity of $d\chi_{M}$ modulo $\pi$ . The equivalence with (ii) follows the same argument as Proposition 2.1, but over the residue field of $K$ . ∎

Write $B^{-}_{V}(r)$ (resp. $B_{V}(r)$ ) for the open (resp. closed) ball of radius $r$ in $V$ , and let $\sigma_{1}(M),\dots,\sigma_{n}(M)$ denote the elementary divisors of $M$ .

Proposition 2.3.

Suppose that $M\in M_{n}(K)$ satisfies one of the conditions in Proposition 2.1, and let

[TABLE]

Then, for all $\rho\in(0,1]$ and all $r\in(0,\alpha^{-1}\cdot\rho^{-1})$ , any lattice $H$ such that $B_{V}^{-}(\rho r)\subset H\subset B_{V}(r)$ satisfies:

[TABLE]

Proof.

Recall [caruso-roe-vaccon:15a]*Def. 3.3 that the precision polygon of $M$ is the lower convex hull of the Newton polygons of the entries of $\operatorname{Com}(X-M)$ . By [caruso-roe-vaccon:15a]*Prop. 3.4, the endpoints of the precision polygon occur at height [math] and $\sum_{i=1}^{n-1}\operatorname{val}(\sigma_{i}(M))$ . By convexity, $B_{W}(1)\subset d\chi_{M}(B_{V}(\alpha^{-1}))$ .

Since the coefficients of $\chi_{M}$ are given by polynomials in the entries of $M$ with integral coefficients, [caruso-roe-vaccon:15a]*Prop. 2.2 implies the conclusion. ∎

The relationship between precision and the images of lattices under $d\chi_{M}$ allows us to apply Proposition 2.2 to determine when the precision of the characteristic polynomial is the minimum possible.

Corollary 2.4.

Suppose that $M\in\operatorname{GL}_{n}(\mathcal{O}_{K})$ is known with precision $O(\pi^{m})$ . Then the characteristic polynomial of $M$ has precision lattice strictly contained in $O(\pi^{m})$ if and only if the reduction of $M$ modulo $\pi$ does not have a cyclic vector.

Note that this criterion is checkable using $O\tilde{~{}}(n^{\omega})$ operations in the residue field [storjohann:01a].

2.2 Stability under multiplication by $X$

By definition, the codomain of $d\chi_{M}$ is $K_{<n}[X]$ . However, when $M$ is given, $K_{<n}[X]$ is canonically isomorphic to $K[X]/\chi_{M}(X)$ as a $K$ -vector space. For our purpose, it will often be convenient to view $d\chi_{M}$ as an $K$ -linear mapping $M_{n}(K)\to K[X]/\chi_{M}(X)$ .

Proposition 2.5.

Let $A$ be the subring of $K[X]$ consisting of polynomials $P$ for which $P(M)\in M_{n}(\mathcal{O}_{K})$ , and $V=d\chi_{M}\big{(}M_{n}(\mathcal{O}_{K})\big{)}$ as a submodule of $K[X]/\chi_{M}(X)$ . Then $V$ is stable under multiplication by $A$ .

Proof.

Let $C=\operatorname{Com}(X-M)$ and $P\in A$ . By (1), $V$ is given by the $\mathcal{O}_{K}$ -span of the entries of $C$ . Using the fact that the product of matrix with its comatrix is the determinant, $(X-M)\cdot C=\chi_{M}$ and thus $P(X)\cdot C\equiv P(M)\cdot C\pmod{\chi_{M}(X)}$ . The span of the entries of the left hand side is precisely $P(X)\cdot V$ , while the span of the entries of the right hand side is contained within $V$ since $P(M)\in M_{n}(\mathcal{O}_{K})$ . ∎

Corollary 2.6.

If $M\in M_{n}(\mathcal{O}_{K})$ , then $d\chi_{M}\big{(}M_{n}(\mathcal{O}_{K})\big{)}$ is stable under multiplication by $X$ and hence is a module over $\mathcal{O}_{K}[X]$ .

2.3 Compact form of $d\chi_{M}$

Let $\mathscr{C}$ be the companion matrix associated to $\chi_{M}$ :

[TABLE]

with $\chi_{M}=a_{0}+a_{1}X+\cdots+a_{n-1}X^{n-1}+X^{n}$ . By Proposition 2.1, there exists a matrix $P\in\operatorname{GL}_{n}(K)$ such that $M=P\mathscr{C}P^{-1}$ . Applying the same result to the transpose of $M$ , we find that there exists another invertible matrix $Q\in\operatorname{GL}_{n}(K)$ such that $M^{t}=Q\mathscr{C}Q^{-1}$ .

Proposition 2.7.

We keep the previous notations and assumptions. Let $V$ be the row vector $(1,X,\ldots,X^{n-1})$ . Then

[TABLE]

for some $\alpha\in K[X]$ .

Proof.

Write $C=\operatorname{Com}(X{-}M)$ . From $(X{-}M)\cdot C\equiv 0\pmod{\chi_{M}}$ , we deduce $(X{-}\mathscr{C})\cdot P^{-1}C\equiv 0\pmod{\chi_{M}}$ . Therefore each column of $P^{-1}C$ lies in the right kernel of $X{-}\mathscr{C}$ modulo $\chi_{M}$ . On the other hand, a direct computation shows that every column vector $W$ lying in the right kernel of $X{-}\mathscr{C}$ modulo $\chi_{M}$ can be written as $W=w\cdot V^{t}$ for some $w\in K[X]/\chi_{M}$ . We deduce that $C\equiv P\cdot V^{t}B\pmod{\chi_{M}}$ for some row vector $B$ . Applying the same reasoning with $M^{t}$ , we find that $B$ can be written $B=\alpha VQ^{t}$ for some $\alpha\in K[X]/\chi_{M}$ and we are done. ∎

Proposition 2.7 shows that $\operatorname{Com}(X{-}M)$ can be encoded by the datum of the quadruple $(\alpha,P,Q,\chi_{M})$ whose total size stays within $O(n^{2})$ : the polynomials $\alpha$ and $\chi_{M}$ are determined by $2n$ coefficients while we need $2n^{2}$ entries to write down the matrices $P$ and $Q$ . We shall see moreover in Section 4 that interesting information can be read off of this short form $(\alpha,P,Q,\chi_{M})$ .

Remark 2.8.

With the previous notation, if $U\in GL_{n}(K),$ the quadruple for $UMU^{-1}$ is $(\alpha,UP,(U^{t})^{-1}Q,\chi_{M}),$ which can be computed in $O(n^{\omega})$ operations in $K.$ This is faster than computing $U\operatorname{Com}(X-M)U^{-1},$ which is, at first sight, in $O(n^{4})$ operations in $K.$

3 Differential

via Hessenberg form

In this section, we combine the computation of a Hessenberg form of a matrix and the computation of the inverse through the Smith normal form (SNF) over a complete discrete valuation field (CDVF) to compute $\operatorname{Com}(X-M)$ and $d\chi$ . If $M\in M_{n}(\mathcal{O}_{K})$ , then only division by invertible elements of $\mathcal{O}_{K}$ will occur.

3.1 Hessenberg form

We begin with the computation of an approximate Hessenberg form.

Definition 3.1.

A Hessenberg matrix is a matrix $M\in M_{n}(K)$ with

[TABLE]

Given integers $n_{i,j}$ , an approximate Hessenberg matrix is a matrix $M\in M_{n}(K)$ with

[TABLE]

If $M\in M_{n}(K)$ and $H\in M_{n}(K)$ is an (approximate) Hessenberg matrix similar to $M$ , we say that H is an (approximate) Hessenberg form of $M.$

It is not hard to prove that every matrix over a field admits a Hessenberg form. We prove here that over $K,$ if a matrix is known at finite (jagged) precision, we can compute an approximate Hessenberg form of it. Moreover, we can provide an exact change of basis matrix. It relies on the following algorithm.

Algorithm 1: Approximate Hessenberg form computation

Input: a matrix $M$ in $M_{n}(K).$

0. $P:=I_{n}.$ $H:=M.$

1. for $j=1,\dots,n-1$ do

2. swap the row $j+1$ with a row $i_{min}$ ( $i_{min}\geq 2$ ) s.t. $\operatorname{val}(H_{i_{min},j})$ is minimal.

3. for $i=j+2,\dots,n$ do

Eliminate the significant digits of $H_{i,j}$ by pivoting with row $j+1$ using a matrix $T.$
$H:=H\times T^{-1}.$ $P:=T\times P.$
Return $H,P.$

Proposition 3.2.

Algorithm 1 computes $H$ and $P$ realizing an approximate Hessenberg form of $M.$ $P$ is exact over finite extensions of $\mathbb{Q}_{p}$ and $k(\mkern-2.5mu(X)\mkern-2.5mu)$ , and the computation is in $O(n^{3})$ operations in $K$ at precision the maximum precision of a coefficent in $M.$

Proof.

Let us assume that $K$ is a finite extensions of $\mathbb{Q}_{p}$ or $k(\mkern-2.5mu(X)\mkern-2.5mu).$ Inside the nested for loop, if we want to eliminate $\pi^{u_{y}}\varepsilon_{y}+O(\pi^{n_{y}})$ with pivot $\pi^{u_{x}}\varepsilon_{x}+O(\pi^{n_{x}}),$ with the $\varepsilon$ ’s being units, the corresponding coefficient of the corresponding shear matrix is the lift(in $\mathbb{Z},$ $\mathbb{F}_{q}[X],$ $\mathbb{Q}[X]$ or adequate extension) of $\pi^{u_{y}-u_{x}}\varepsilon_{y}\varepsilon_{x}^{-1}\mod\pi^{u_{y}-u_{x}\min(n_{x}-u_{x},n_{y}-u_{y})}.$ Exactness follows directly. Over other fields, we can not lift, but the computations are still valid. The rest is clear. ∎

Remark 3.3.

From a Hessenberg form of $M,$ it is well known that one can compute the characteristic polynomial of $M$ in $O(n^{3})$ operations in $K$ [Cohen:2013]*pp. 55–56. However, this computation involves division, and its precision behavior is not easy to quantify.

3.2 Computation of the inverse

In this section, we prove that to compute the inverse of a matrix over a CDVF $K$ , the Smith normal form is precision-wise optimal in the flat-precision case. We first recall the differential of matrix inversion.

Lemma 3.4.

Let $u\>:\>GL_{n}(K)\rightarrow GL_{n}(K),$ $M\mapsto M^{-1}.$ Then for $M\in GL_{n}(K),$ $du_{M}(dM)=M^{-1}dMM^{-1}.$ It is always surjective.

We then have the following result about the loss in precision when computing the inverse.

Proposition 3.5.

Let $\operatorname{cond}(M)=\operatorname{val}(\sigma_{n}(M))$ . If $dM$ is a flat precision of $O(\pi^{m})$ on $M$ then $M^{-1}$ can be computed at precision $O(\pi^{m-2\operatorname{cond}(M)})$ by a SNF computation and this lower-bound is optimal, at least when $m$ is large.

Proof.

The smallest valuation of a coefficient of $M^{-1}$ is $-\operatorname{cond}(M).$ It is $-2\operatorname{cond}(M)$ for $M^{-2}$ and it is then clear that $m-2\operatorname{cond}(M)$ can be obtained as the valuation of a coefficient of $du_{M}(dM)$ and the smallest that can be achieved this way for $dM$ in a precision lattice of flat precision. Hence the optimality of the bound given, at least when $m$ is large [caruso-roe-vaccon:14a]*Lem. 3.4.

Now, the computation of the Smith normal form was described in [Vaccon-these]. From $M$ known at flat precision $O(\pi^{m}),$ we can obtain an exact $\Delta$ , and $P$ and $Q$ known at precision at least $O(\pi^{m-\operatorname{cond}(M)})$ , with coefficients in $\mathcal{O}_{K}$ and determinant in $\mathcal{O}_{K}^{\times}$ realizing an Smith normal form of $M.$ There is no loss in precision when computing $P^{-1}$ and $Q^{-1}.$ Since the smallest valuation occurring in $\Delta^{-1}$ is $-\operatorname{cond}(M),$ we see that $M^{-1}=Q^{-1}\Delta^{-1}P^{-1}$ is known at precision at least $O(\pi^{m-2\operatorname{cond}(M)}),$ which concludes the proof. ∎

3.3 The comatrix of $X{-}H$

In this section, we compute $\operatorname{Com}(X-H)$ for a Hessenberg matrix $H$ using the Smith normal form computation of the previous section. The entries of $\operatorname{Com}(X-H)$ lie in $K[X]$ , which is not a CDVF, so we may not directly apply the methods of the previous section. However, we may relate $\operatorname{Com}(X-H)$ to $\operatorname{Com}(1-XH)$ , whose entries lie in the CDVF $K(\mkern-2.5mu(X)\mkern-2.5mu)$ . In this way, we compute $\operatorname{Com}(X-H)$ using an SNF method, with no division in $K$ .

First, we need a lemma relating comatrices of similar matrices:

Lemma 3.6.

If $M_{1},M_{2}\in M_{n}(K)$ and $P\in GL_{n}(K)$ are such that $M_{1}=PM_{2}P^{-1},$ then:

[TABLE]

The second ingredient we need is reciprocal polynomials. We extend its definition to matrices of polynomials.

Definition 3.7.

Let $d\in\mathbb{N}$ and $P\in K[X]$ of degree at most $d.$ We define the reciprocal polynomial of order $d$ of $P$ as $P^{\operatorname{rec},d}=X^{d}P\left(1/X\right).$ Let $A\in M_{n}(K[X])$ a matrix of polynomials of degree at most $d.$ We denote by $A^{\operatorname{rec},d}$ the matrix with $(A^{\operatorname{rec},d})_{i,j}=(A_{i,j})^{\operatorname{rec},d}$ .

We then have the following result :

Lemma 3.8.

Let $M\in M_{n}(K).$ Then:

[TABLE]

Proof.

It all comes down to the following result: let $A\in M_{d}(K[X])$ a matrix of polynomials of degree at most $1,$ then $\det(A^{\operatorname{rec},1})=\det(A)^{\operatorname{rec},d}.$ Indeed, one can use multilinearity of the determinant on $X^{d}\det(A(1/X))$ to prove this result. It directly implies the second part of the lemma; the first part follows from the fact that the entries of $\operatorname{Com}(X-M)$ and of $\operatorname{Com}(1-XM)$ are determinants of size $n-1$ . ∎

This lemma allows us to compute $\operatorname{Com}(1-XM)$ instead of $\operatorname{Com}(X-M).$ This has a remarkable advantage: the pivots during the computation of the SNF of $\operatorname{Com}(1-XM)$ are units of $\mathcal{O}_{K}[\mkern-2.5mu[X]\mkern-2.5mu],$ and are known in advance to be on the diagonal. This leads to a very smooth precision and complexity behaviour when the input matrix lives in $M_{n}(\mathcal{O}_{K}).$

Algorithm 2: Approximate $\operatorname{Com}(X-H)$

Input: an approximate Hessenberg matrix $H$ in $M_{n}(\mathcal{O}_{K}).$

0. $U:=1-XH.$ $U_{0}:=1-XH.$

1. While updating $U$ , track $P$ and $Q$ so that $U_{0}=PUQ$ is always satisfied.

2. for $i=1,\dots,n-1$ do

3. Eliminate, modulo $X^{n+1}$ the coefficients $U_{i,j},$ for $j\geq i+1$ using the invertible pivot $U_{i,i}=1+XL_{i,i}\mod X^{n+1}$ (with $L_{i,i}\in\mathcal{O}_{K}[X]$ ).

4. for $i=1,\dots,n-1$ do

Eliminate, modulo $X^{n+1}$ the coefficients $U_{i+1,i},$ using the invertible pivot $U_{i,i}.$
$\psi:=\prod_{i}U_{i,i}.$
Rescale to get $U=I_{n}\mod X^{n+1}.$
$V:=\psi\times P\times Q\mod X^{n+1}.$ 111The product $P\times Q$ should be implemented by sequential row operations corresponding to the eliminations in Step 5 in order to avoid a product of two matrices in $M_{n}(\mathcal{O}_{K}[X])$ .
Return $V^{\operatorname{rec},n-1},\psi^{\operatorname{rec},n}.$

Theorem 3.9.

Let $H\in M_{n}(\mathcal{O}_{K})$ be an approximate Hessenberg matrix. Then, using Algorithm 2, one can compute $\operatorname{Com}(X-H)$ and $\chi_{H}$ in $O\tilde{~{}}(n^{3})$ operations in $\mathcal{O}_{K}$ at the precision given by $H.$

Proof.

First, the operations of the lines 2 and 3 use $O\tilde{~{}}(n^{3})$ operations in $\mathcal{O}_{K}$ at the precision given by $H.$ Indeed, since $H$ is an approximate Heisenberg matrix, when we use $U_{i,i}$ as pivot the only other nonzero coefficient in its column is $U_{i+1,i}$ . As a consequence, when performing this column-pivoting, only two rows ( $i$ and $i+1$ ) lead to operations in $\mathcal{O}_{K}[\mkern-2.5mu[X]\mkern-2.5mu]$ other than checking precision. Hence, line 3 costs $O\tilde{~{}}(n^{2})$ for the computation of $U.$ Following line 1, the computation of $Q$ is done by operations on rows, starting from the identity matrix. The order in which the entries of $U$ are cleared implies that $Q$ is just filled in as an upper triangular matrix: no additional operations in $\mathcal{O}_{K}[\mkern-2.5mu[X]\mkern-2.5mu]$ are required. Thus the total cost for lines 2 and 3 is indeed $O\tilde{~{}}(n^{3})$ operations.

For lines 4 and 5, there are only $n-1$ eliminations, resulting in a $O\tilde{~{}}(n^{2})$ cost for the computation of $U.$ . Rather than actually construct $P$ , we just track the eliminations performed in order to do the corresponding row operations on $Q$ , since we only need the product $P\times Q$ .

Line 6 is in $O\tilde{~{}}(n^{2})$ and 7 in $O\tilde{~{}}(n^{3}).$

Thanks to the fact that the $P$ only corresponds to the product of $n-1$ shear matrices, the product on line 8 is in $O\tilde{~{}}(n^{3}).$ We emphasize that no division has been done throughout the algorithm. Line 9 is costless, and the result is then proved. ∎

Remark 3.10.

If $M\in M_{n}(K)$ does not have coefficients in $\mathcal{O}_{K},$ we may apply Algorithms 1 and 2 to $p^{v}M\in M_{n}(\mathcal{O}_{K})$ in $O\tilde{~{}}(n^{3})$ operations in $\mathcal{O}_{K}$ , and then divide the coefficient of $X^{k}$ in the resulting polynomial by $p^{kv}$ .

We will see in Section 5 that for an entry matrix with coefficients known at flat precision, Algorithms 1 and 2 are enough to know the optimal jagged precision on $\chi_{M}.$

3.4 The comatrix of $X{-}M$

In this section, we combine Proposition 2.7 with Algorithm 2 to compute the comatrix of $X-M$ when $\chi_{M}$ is squarefree. Note that this condition on $\chi_{M}$ is equivalent to $M$ being diagonalizable under the assumption that $d\chi_{M}$ is surjective. The result is the following $O\tilde{~{}}(n^{3})$ algorithm, where the only divisions are for gcd and modular inverse computations.

Algorithm 3: Approximate $\operatorname{Com}(X{-}M)$

Input: an approx. $M\in M_{n}(\mathcal{O}_{K}),$ with $\operatorname{Disc}(\chi_{M})\neq 0.$

0. Find $P\in GL_{n}(\mathcal{O}_{K})$ and $H\in M_{n}(\mathcal{O}_{K}),$ approximate Hessenberg, such that $M=PHP^{-1},$ using Algorithm 1.

1. Compute $A=\operatorname{Com}(X-H)$ and $\chi_{M}=\chi_{H}$ using Algorithm 2.

2. Do $\operatorname{row}(A,1)\leftarrow\operatorname{row}(A,1)+\sum_{i=2}^{n}\mu_{i}\operatorname{row}(A,i),$ for random $\mu_{i}\in\mathcal{O}_{K},$ by doing $T\times A$ for some $T\in GL_{n}(\mathcal{O}_{K}).$ Compute $B:=TAT^{-1}.$

3. Similarily compute $C:=S^{-1}BS$ for $S\in GL_{n}(\mathcal{O}_{K})$ corresponding to adding a random linear combination of the columns of index $j\geq 2$ to the first column of $B.$

4. If $\gcd(C_{1,1},\chi_{M})\neq 1,$ then go to 2.

Let $F$ be the inverse of $C_{1,1}\mod\chi_{M}$ .
Let $U:=\operatorname{col}(C,1)$ and $V:=F\cdot\operatorname{row}(C,1)\mod\chi_{M}$ .
Return $\operatorname{Com}(X-M):=(PT^{-1}SU\times VS^{-1}TP^{-1})\mod\chi_{M}.$

Theorem 3.11.

For $M\in M_{n}(\mathcal{O}_{K})$ such that $\operatorname{Disc}(\chi_{M})\neq 0,$ Algorithm 3 computes $\operatorname{Com}(X-M)\pmod{\chi_{M}}$ in average complexity $O\tilde{~{}}(n^{3})$ operations in $K$ . The only divisions occur in taking gcds and inverses modulo $\chi_{M}$ .

Proof.

As we have already seen, completing Steps 0 and 1 is in $O\tilde{~{}}(n^{3}).$ Multiplying by $T$ or $S$ or their inverse corresponds to $n$ operations on rows or columns over a matrix with coefficients in $\mathcal{O}_{K}[X]$ of degree at most $n.$ Thus, it is in $O\tilde{~{}}(n^{3}).$ Step 5 is in $O\tilde{~{}}(n),$ Step 6 in $O\tilde{~{}}(n^{2})$ and Step 7 in $O\tilde{~{}}(n^{3})$ . All that is to prove is that the set of $P$ and $S$ to avoid is of dimension at most $n-1.$ The idea is to work modulo $X-\lambda$ for $\lambda$ a root of $\chi(M)$ (in an algebraic closure) and then apply Chinese Remainder Theorem. The goal of the Step $2$ is to ensure the first row of $B$ contains an invertible entry modulo $\chi_{M}.$ Since $A(\lambda)$ is of rank one, the $\mu_{i}$ ’s have to avoid an affine hyperplane so that $\operatorname{row}(B,1)\mod(X-\lambda)$ is a non-zero vector. Hence for $\operatorname{row}(B,1)\mod\chi(M)$ to contain an invertible coefficient, a finite union of affine hyperplane is to avoid. Similarly, the goal of Step 3 is to put an invertible coefficient (modulo $\chi_{M}$ ) on $C_{1,1},$ and again, only a finite union of affine hyperplane is to avoid. Hence, the set that the $\mu_{i}$ ’s have to avoid is a finite union of hyperplane, and hence, is of dimension at most $n-1.$ Thus, almost any choice of $\mu_{i}$ leads to a matrix $C$ passing the test in Step 4. This concludes the proof. ∎

Remark 3.12.

As in the previous section, it is possible to scale $M\in M_{n}(K)$ so as to get coefficients in $\mathcal{O}_{K}$ and apply the previous algorithm.

Remark 3.13.

We refer to [caruso:15a] for the handling of the precision of gcd and modular inverse computations. In this article, ways to tame the loss of precision coming from divisions are explored, following the methods of [caruso-roe-vaccon:14a].

4 Differential

via Frobenius form

The algorithm designed in the previous section computes the differential $d\chi_{M}$ of $\chi$ at a given matrix $M\in M_{n}(K)$ for a cost of $O(n^{3})$ operations in $K$ . This seems to be optimal given that the (naive) size of the $d\chi_{M}$ is $n^{3}$ : it is a matrix of size $n\times n^{2}$ . It turns out however that improvements are still possible! Indeed, thanks to Proposition 2.7, the matrix of $d\chi_{M}$ admits a compact form which can be encoded using only $O(n^{2})$ coefficients. The aim of this short section is to design a fast algorithm (with complexity $O\tilde{~{}}(n^{\omega})$ ) for computing this short form. The price to pay is that divisions in $K$ appear, which can be an issue regarding to precision in particular cases. In this section, we only estimate the number of operations in $K$ and not their behaviour on precision.

From now on, we fix a matrix $M\in M_{n}(K)$ for which $d\chi_{M}$ is surjective. Let $(\alpha,P,Q,\chi_{M})$ be the quadruple encoding the short form of $d\chi_{M}$ ; we recall that they are related by the relations:

[TABLE]

An approximation to $\chi_{M}$ can be computed in $O\tilde{~{}}(n^{\omega})$ operations in $K$ (e.g. as a by-product of [storjohann:01a]).

The matrix $P$ can be computed as follows. Pick $c\in K^{n}$ . Define $c_{i}=M^{i}c$ for all $i\geq 1$ . The $c_{i}$ ’s can be computed in $O\tilde{~{}}(n^{\omega})$ operations in $K,$ e.g. using the first algorithm of [keller-gehrig:85a]. Let $P_{\text{\rm inv}}$ be the $n\times n$ matrix whose rows are the $c_{i}$ ’s for $1\leq i\leq n$ . Remark that $P_{\text{\rm inv}}$ is invertible if and only if $(c_{0},c_{1},\ldots,c_{n-1})$ is a basis of $K^{n}$ if and only if $c$ is a cyclic vector. Moreover after base change to the basis $(c_{0},\ldots,c_{n-1})$ , the matrix $M$ takes the shape (3). In other words, if $P_{\text{\rm inv}}$ is invertible, then $P=P_{\text{\rm inv}}^{-1}$ is a solution of $M=P\mathscr{C}P^{-1}$ , where $\mathscr{C}$ is the companion matrix similar to $M$ . Moreover, observe that the condition “ $P_{\text{\rm inv}}$ is invertible” is open for the Zariski topology. It then happens with high probability as soon as it is not empty, that is as soon as $M$ admits a cyclic vector, which holds by assumption.

The characteristic polynomial $\chi_{M}$ can be recovered thanks to the relation $a_{0}c_{0}+a_{1}c_{1}+\dots+a_{n-1}c_{n-1}=-c_{n-1}\cdot P$ .

Now, instead of directly computing $Q$ , we first compute a matrix $R$ with the property that $\mathscr{C}^{t}=R\mathscr{C}R^{-1}$ . To do so, we apply the same strategy as above except that we start with the vector $e=(1,0,\ldots,0)$ (and not with a random vector). A simple computation shows that, for $1\leq i\leq n{-}1$ , the vector $\mathscr{C}^{i}e$ has the shape:

[TABLE]

with $n{-}i$ starting zeros. Therefore the $\mathscr{C}^{i}e$ ’s form a basis of $K^{n}$ , i.e. $e$ is always a cyclic vector of $\mathscr{C}$ . Once $R$ has been computed, we recover $Q$ using the relation $Q=P_{\text{\rm inv}}^{t}R$ .

It remains to compute the scaling factor $\alpha$ . For this, we write the relation:

[TABLE]

which comes from Eq. (4) after multiplication on the left by $P^{-1}$ and multiplication on the right by $P$ . We observe moreover that the first row of $R$ is $(1,0,\ldots,0)$ . Evaluating the top left entry of Eq. (5), we end up with the relation:

[TABLE]

No further computation are then needed to derive the value of $\alpha$ . We summarize this section with the following theorem:

Theorem 4.1.

Given $M\in M_{n}(K)$ such that $d\chi_{M}$ is surjective, then one can compute $(\alpha,P,Q,\chi_{M})$ in $K[X]$ such that $\operatorname{Com}(X{-}N)=\alpha\cdot V^{t}\cdot VR^{t}\mod\chi_{M}$ in $O\tilde{~{}}(n^{\omega})$ operations in $K.$

5 Optimal jagged precision

In the previous Sections, 3 and 4, we have proposed algorithms to obtain the comatrix of $X-M.$ Our motivation for these computations is to then be able to understand what is the optimal precision on $\chi_{M}.$ In this section, we provide some answers to this question, along with numerical evidence. We also show that it is then possible to derive optimal precision of eigenvalues of $M.$

5.1 On the characteristic polynomial

For $0\leq k<n$ , let $\pi_{k}:K[X]\to K$ be the mapping taking a polynomial to its coefficients in $X^{k}$ . By applying [caruso-roe-vaccon:14a]*Lem. 3.4 to the composite $\pi_{k}\circ\chi_{M}$ , one can figure out the optimal precision on the $k$ -th coefficient of the characteristic polynomial of $M$ (at least if $M$ is given at enough precision).

Let us consider more precisely the case where $M$ is given at jagged precision: the $(i,j)$ entry of $M$ is given at precision $O(\pi^{N_{i,j}})$ for some integers $N_{i,j}$ . Lemma 3.4 of [caruso-roe-vaccon:14a] then shows that the optimal precision on the $k$ -th coefficient of $\chi_{M}$ is $O(\pi^{N^{\prime}_{k}})$ where $N^{\prime}_{k}$ is given by the formula:

[TABLE]

where $C_{i,j}$ is the $(i,j)$ entry of the comatrix $\operatorname{Com}(X{-}M)$ .

Proposition 5.1.

If $M\in M_{n}(\mathcal{O}_{K})$ is given at (high enough) jagged precision, then we can compute the optimal jagged precision on $\chi_{M}$ in $O\tilde{~{}}(n^{3})$ operations in $K$ .

Proof.

We have seen in §3 and §4 that the computation of the matrix $C=\operatorname{Com}(X{-}M)$ can be carried out within $O\tilde{~{}}(n^{3})$ operations in $K$ (either with the Hessenberg method or the Frobenius method). We conclude by applying Eq. (6) which requires no further operation in $K$ (but $n^{3}$ evaluations of valuations and $n^{3}$ manipulations of integers). ∎

Remark 5.2.

If $M\in M_{n}(\mathcal{O}_{K})$ is given at (high enough) flat precision, then we can avoid the final base change step in the Hessenberg method. Indeed, observe that, thanks to Lemma 3.6, we can write:

[TABLE]

where $P$ lies in $\operatorname{GL}_{n}(\mathcal{O}_{K})$ . Moreover, the latter condition implies that $P^{-1}dMP$ runs over $M_{n}(\mathcal{O}_{K})$ when $P$ runs over $M_{n}(\mathcal{O}_{K})$ . As a consequence, the integer $N^{\prime}_{k}$ giving the optimal precision on the $k$ -th coefficient of $M$ is also equal to $N+\min_{1\leq i,j\leq n}\operatorname{val}(\pi_{k}(C^{H}_{i,j}))$ where $C^{H}_{i,j}$ is the $(i,j)$ entry of $\operatorname{Com}(X{-}H)$ , where $H$ is the Hessenberg form of $M$ .

Remark 5.3.

As a consequence of the previous discussion, once the optimal jagged precision is known, it is possible to lift the entries of $M$ to a sufficiently large precision, rescale them to have entries in $O_{K}$ and then use Algorithm 2 to compute the characteristic polynomial. The output might then need to be rescaled and truncated at the optimal precision. This requires $O\tilde{~{}}(n^{3})$ operations in $O_{K}$ and unfortunately, for several instances, may require to increase a lot the precision.

Numerical experiments. We have made numerical experiments in SageMath [sage] in order to compare the optimal precision obtained with the methods explained above with the actual precision obtained by the software. For doing so, we picked a sample of $1000$ random matrices $M$ in $M_{9}(\mathbb{Q}_{2})$ where all the entries are given at the same relative precision. We recall that, in SageMath, random elements $x\in\mathbb{Q}_{p}$ are generated as follows. We fix an integer $N$ — the so-called relative precision — and generate elements of $\mathbb{Q}_{p}$ of the shape

[TABLE]

where $v$ is a random integer generated according to the distribution:

[TABLE]

and $a$ is an integer in the range $[0,p^{N})$ , selected uniformly at random.

Once this sample has been generated, we computed, for each $k\in\{0,1,\ldots,8\}$ , the three following quantities:

$\bullet$

the optimal precision on the $k$ -th coefficient of the characteristic polynomial of $M$ given by Eq. (6) 2. $\bullet$

in the capped relative model222Each coefficient carries its own precision which is updated after each elementary arithmetical operation., the precision gotten on the $k$ -th coefficient of the characteristic polynomial of $M$ computed via the call:

$M\texttt{.charpoly(algorithm="df")}$ 3. $\bullet$

in the model of floating-point arithmetic (see [caruso:17a]*§2.3), the number of correct digits of the $k$ -th coefficient of the characteristic polynomial of $M$ .

Remark 5.4.

The keyword algorithm="df" forces SageMath to use the division free algorithm of [seifullin:02a]. It is likely that, proceeding so, we limit the loss of precision.

The table of Figure 1 summarizes the results obtained.

It should be read as follows. First, the acronyms CR and FP refers to “capped relative” and “floating-point” respectively. The numbers displayed in the table are the average loss of relative precision. More precisely, if $N$ is the relative precision at which the entries of the input random matrix $M$ have been generated and $v$ is the valuation of the $k$ -th coefficient of $\chi_{M}$ , then:

$\bullet$

the column “Optimal” is the average of the quantities $(N^{\prime}_{k}{-}v)-N$ (where $N^{\prime}_{k}$ is defined by Eq. (6)): $N^{\prime}_{k}{-}v$ is the optimal relative precision, so that the difference $(N^{\prime}_{k}{-}v)-N$ is the loss of relative precision; 2. $\bullet$

the column “CR” is the average of the quatities $(\text{CR}_{k}{-}v)-N$ where $\text{CR}_{k}$ is the computed (absolute) precision on the $k$ -th coefficient of $\chi_{M}$ ; 3. $\bullet$

the column “FP” is the average of the quatities $(\text{FP}_{k}{-}v)-N$ where $\text{FP}_{k}$ is the first position of an incorrect digit on the $k$ -th coefficient of $\chi_{M}$ .

We observe that the loss of relative accuracy stays under control in the “Optimal” column whereas it has a very erratic behavior — very large values and very large deviation as well — in the two other columns. These experiments thus demonstrate the utility of the methods developed in this paper.

5.2 On eigenvalues

Let $M\in M_{n}(K)$ and $\lambda\in K$ be a simple 333the corresponding generalized eigenspace has dimension $1$ eigenvalue of $M$ . We are interesting in quantifying the optimal precision on $\lambda$ when $M$ is given with some uncertainty.

To do so, we fix an approximation $M_{\text{\rm app}}\in M_{n}(K)$ of $M$ and suppose that the uncertainty of $M$ is “jagged” in the sense that each entry of $M$ is given at some precision $O(\pi^{N_{i,j}})$ . Let $\lambda_{\text{\rm app}}$ be the relevant eigenvalue of $M_{\text{\rm app}}$ . We remark that it is possible to follow the eigenvalue $\lambda_{\text{\rm app}}$ on a small neighborhood $\mathcal{U}$ of $M$ . More precisely, there exists a unique continuous function $f:\mathcal{U}\to K$ such that:

$\bullet$

$f(M_{\text{\rm app}})=\lambda_{\text{\rm app}}$ , and 2. $\bullet$

$f(M^{\prime})$ is an eigenvalue of $M^{\prime}$ for all $M^{\prime}\in\mathcal{U}$ .

Lemma 5.5.

The function $f$ is strictly differentiable on a neighborhood of $M_{\text{\rm app}}$ . The differential of $f$ at $M$ is the linear mapping:

[TABLE]

where $\chi^{\prime}_{M}$ is the usual derivative of $\chi_{M}$ .

Proof.

The first assertion follows from the implicit function Theorem. Differentiating the relation $\chi_{M}(\lambda)=0$ , we get $\chi^{\prime}_{M}(\lambda)\cdot d\lambda+\operatorname{Tr}(\operatorname{Com}(X-M)\cdot dM)(\lambda)=0$ , from which the Lemma follows. ∎

Lemma 3.4 of [caruso-roe-vaccon:14a] now implies that, if the $N_{i,j}$ ’s are large enough and sufficiently well balanced, the optimal precision on the eigenvalue $\lambda$ is $O(\pi^{N^{\prime}})$ with:

[TABLE]

where $C_{i,j}$ denotes as above the $(i,j)$ entry of $\operatorname{Com}(X{-}M)$ . Writing $\operatorname{Com}(X{-}M)=\alpha\cdot PV^{t}\cdot VQ^{t}\text{ mod }\chi_{M}$ as in Proposition 2.7, we find:

[TABLE]

where $P_{i}$ denotes the $i$ -th row of $P$ and, similarly, $Q_{j}$ denotes the $j$ -th row of $Q$ . Note moreover that $V(\lambda)$ is the row vector $(1,\lambda,\ldots,\lambda^{n-1})$ . By the discussion of §4, the exact value of $N^{\prime}$ can be determined for a cost of $O\tilde{~{}}(n^{\omega})$ operations in $K$ and $O(n^{2})$ operations on integers.

When $M$ is given at flat precision, i.e. the $N_{i,j}$ ’s are all equal to some $N$ , the formula for $N^{\prime}$ may be rewritten:

[TABLE]

and can therefore now be evaluated for a cost of $O\tilde{~{}}(n^{\omega})$ operations in $K$ and only $O(n)$ operations with integers.

To conclude, let us briefly discuss the situation where we want to figure out the optimal jagged precision on a tuple $(\lambda_{1},\ldots,\lambda_{s})$ of simple eigenvalues. Applying (7), we find that the optimal precision on $\lambda_{k}$ is

[TABLE]

Proposition 5.6.

The $N^{\prime}_{k}$ ’s can be all computed in $O\tilde{~{}}(n^{\omega})$ operations in $K$ and $O(n^{2}s)$ operations with integers.

If the $N_{i,j}$ ’s are all equal, the above complexity can be lowered to $O\tilde{~{}}(n^{\omega})$ operations in $K$ and $O(ns)$ operations with integers.

Proof.

The $\alpha(\lambda_{k})$ ’s and the $\chi^{\prime}_{M}(\lambda_{k})$ ’s can be computed for a cost of $O\tilde{~{}}(ns)$ operations in $K$ using fast multipoint evaluation methods (see 10.7 of [gathen-gerhard:13a]). On the other hand, we observe that $P_{i}V(\lambda_{k})^{t}$ is nothing but the $(i,k)$ entry of the matrix:

[TABLE]

The latter product can be computed in $O\tilde{~{}}(n^{\omega})$ operations in $K$ 444It turns out that $O\tilde{~{}}(n^{2})$ is also possible because the right factor is a structured matrix (a truncated Vandermonde): computing the above product reduces to evaluating a polynomial at the points $\lambda_{1},\ldots,\lambda_{s}$ .. Therefore all the $P_{i}V(\lambda_{k})^{t}$ ’s (for $i$ and $k$ varying) can be determined with the same complexity. Similarly all the $V(\lambda)Q_{j}^{t}$ are computed for the same cost. The first assertion of Proposition 5.6 follows. The second assertion is now proved similarly to the case of a unique eigenvalue. ∎

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Characteristic polynomials of p-adic matrices

Abstract

doi:

keywords:

1 Introduction

Previous contributions.

The contribution of this paper.

Organization of the article.

Notation

2 Theoretical study

2.1 The theory of p-adic precision

Proposition 2.1**.**

Proof.

Proposition 2.2**.**

Proof.

Proposition 2.3**.**

Proof.

Corollary 2.4**.**

2.2 Stability under multiplication by XXX

Proposition 2.5**.**

Proof.

Corollary 2.6**.**

2.3 Compact form of dχMd\chi_{M}dχM​

Proposition 2.7**.**

Proof.

Remark 2.8**.**

3 Differential

3.1 Hessenberg form

Definition 3.1**.**

Proposition 3.2**.**

Proof.

Remark 3.3**.**

3.2 Computation of the inverse

Lemma 3.4**.**

Proposition 3.5**.**

Proof.

3.3 The comatrix of X−HX{-}HX−H

Lemma 3.6**.**

Definition 3.7**.**

Lemma 3.8**.**

Proof.

Theorem 3.9**.**

Proof.

Remark 3.10**.**

3.4 The comatrix of X−MX{-}MX−M

Theorem 3.11**.**

Proof.

Remark 3.12**.**

Remark 3.13**.**

4 Differential

Theorem 4.1**.**

5 Optimal jagged precision

5.1 On the characteristic polynomial

Proposition 5.1**.**

Proof.

Remark 5.2**.**

Remark 5.3**.**

Remark 5.4**.**

5.2 On eigenvalues

Lemma 5.5**.**

Proof.

Proposition 5.6**.**

Proof.

References

Proposition 2.1.

Proposition 2.2.

Proposition 2.3.

Corollary 2.4.

2.2 Stability under multiplication by $X$

Proposition 2.5.

Corollary 2.6.

2.3 Compact form of $d\chi_{M}$

Proposition 2.7.

Remark 2.8.

Definition 3.1.

Proposition 3.2.

Remark 3.3.

Lemma 3.4.

Proposition 3.5.

3.3 The comatrix of $X{-}H$

Lemma 3.6.

Definition 3.7.

Lemma 3.8.

Theorem 3.9.

Remark 3.10.

3.4 The comatrix of $X{-}M$

Theorem 3.11.

Remark 3.12.

Remark 3.13.

Theorem 4.1.

Proposition 5.1.

Remark 5.2.

Remark 5.3.

Remark 5.4.

Lemma 5.5.

Proposition 5.6.