On Parametric Linear System Solving

Robert M. Corless; Mark Giesbrecht; Leili Rafiee Sevyeri; and B. David Saunders

arXiv:2508.21629·math.RA·September 1, 2025

On Parametric Linear System Solving

Robert M. Corless, Mark Giesbrecht, Leili Rafiee Sevyeri, and B. David Saunders

PDF

TL;DR

This paper presents a polynomial-time method for solving parametric linear systems with up to three parameters, leveraging Hermite and Smith normal forms to efficiently analyze solution regimes and singularities.

Contribution

It introduces a novel approach that reduces the complexity of solving parametric linear systems by exploiting algebraic normal forms, improving over previous exponential methods.

Findings

01

Polynomial-time solution method for systems with up to three parameters.

02

Effective identification of singularities and structural changes in the system.

03

Reduction of regimes needed from exponential to polynomial in system size.

Abstract

Parametric linear systems are linear systems of equations in which some symbolic parameters, that is, symbols that are not considered to be candidates for elimination or solution in the course of analyzing the problem, appear in the coefficients of the system. In this work we assume that the symbolic parameters appear polynomially in the coefficients and that the only variables to be solved for are those of the linear system. The consistency of the system and expression of the solutions may vary depending on the values of the parameters. It is well-known that it is possible to specify a covering set of regimes, each of which is a semi-algebraic condition on the parameters together with a solution description valid under that condition. We provide a method of solution that requires time polynomial in the matrix dimension and the degrees of the polynomials when there are up to three…

Equations38

V (Z) \cap D (N) \subseteq i = 1 ⋃ k V (Z_{i}) \cap D (N_{i}) .

V (Z) \cap D (N) \subseteq i = 1 ⋃ k V (Z_{i}) \cap D (N_{i}) .

A = [x 0 00], b = [0 y],

A = [x 0 00], b = [0 y],

Mv + Bw = c

Mv + Bw = c

Cv + Dw = d .

Cv + Dw = d .

([v w]^{T}, [- M^{- 1} B I], N, Z),

([v w]^{T}, [- M^{- 1} B I], N, Z),

H (a, ξ) B (a, ξ) = [H_{r} 0 H_{12} 0] [- H_{r}^{- 1} H_{12} I_{n - r}] = [00],

H (a, ξ) B (a, ξ) = [H_{r} 0 H_{12} 0] [- H_{r}^{- 1} H_{12} I_{n - r}] = [00],

d \geq j = 1 \sum t (r_{0} - r_{j}) \geq 1 + 2 + \dots + t = t (t + 1) /2,

d \geq j = 1 \sum t (r_{0} - r_{j}) \geq 1 + 2 + \dots + t = t (t + 1) /2,

(U_{f}, H_{f}, R_{f}, N_{f} = (N_{0} ∖ {f}) \cup Fac (δ_{f}), Z_{f} = {f}),

(U_{f}, H_{f}, R_{f}, N_{f} = (N_{0} ∖ {f}) \cup Fac (δ_{f}), Z_{f} = {f}),

Y = z^{- 1} 1111110 z^{- 1} 111110 z z^{- 1} 111101 z z^{- 1} 111011 z z^{- 1} 110111 z z^{- 1} 101111 z z^{- 1} 011111 z 0111111 z .

Y = z^{- 1} 1111110 z^{- 1} 111110 z z^{- 1} 111101 z z^{- 1} 111011 z z^{- 1} 110111 z z^{- 1} 101111 z z^{- 1} 011111 z 0111111 z .

\mathbf{A}=\left[\begin{array}[]{cc}a&1\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&a\end{array}\right]\>.

\mathbf{A}=\left[\begin{array}[]{cc}a&1\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&a\end{array}\right]\>.

\mathbf{X}_{A}=\left[\begin{array}[]{cc}\ln\left(a\right)&{a}^{-1}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&\ln\left(a\right)\end{array}\right]\>.

\mathbf{X}_{A}=\left[\begin{array}[]{cc}\ln\left(a\right)&{a}^{-1}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&\ln\left(a\right)\end{array}\right]\>.

\mathbf{B}=\left[\begin{array}[]{cc}a&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&a\end{array}\right]

\mathbf{B}=\left[\begin{array}[]{cc}a&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&a\end{array}\right]

\mathbf{X}_{B}=\left[\begin{array}[]{cc}\ln\left(a\right)&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&\ln\left(a\right)\end{array}\right]\>.

\mathbf{X}_{B}=\left[\begin{array}[]{cc}\ln\left(a\right)&0\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&\ln\left(a\right)\end{array}\right]\>.

\mathbf{X}_{D}=\left[\begin{array}[]{cc}\ln\left(a\right)+2\,i\pi&{a}^{-1}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&\ln\left(a\right)-2\,i\pi\end{array}\right]

\mathbf{X}_{D}=\left[\begin{array}[]{cc}\ln\left(a\right)+2\,i\pi&{a}^{-1}\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr 0&\ln\left(a\right)-2\,i\pi\end{array}\right]

M = [a 0 b a]

M = [a 0 b a]

H_{b \neq = 0} = [10 λ / b (λ - a)^{2}], H_{b = 0} = [λ - a 0 0 λ - a],

H_{b \neq = 0} = [10 λ / b (λ - a)^{2}], H_{b = 0} = [λ - a 0 0 λ - a],

A = λ I - J = λ - w 000 0 λ - x 00 - a - b λ - a - y b - c - d c λ - d - z .

A = λ I - J = λ - w 000 0 λ - x 00 - a - b λ - a - y b - c - d c λ - d - z .

H = λ + w 000 0 λ + x 00 0010 - (a d - aλ - b c - a /10) / c λ + 1/10 (d - λ - 1/10) / c λ^{2} + (1/5 - a - d) λ + a d - c b - (1/10) d - (1/10) a + 1/100 .

H = λ + w 000 0 λ + x 00 0010 - (a d - aλ - b c - a /10) / c λ + 1/10 (d - λ - 1/10) / c λ^{2} + (1/5 - a - d) λ + a d - c b - (1/10) d - (1/10) a + 1/100 .

\mathrm{KMS}_{n}=\left[\begin{array}[]{ccccc}1&-\rho&&&\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr\>-\rho&{\rho}^{2}+1&\>-\rho&&\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr&\ddots&\ddots&\ddots&\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr&&-\rho&{\rho}^{2}+1&\>-\rho\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr&&&-\rho&1\end{array}\right]\>.

\mathrm{KMS}_{n}=\left[\begin{array}[]{ccccc}1&-\rho&&&\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr\>-\rho&{\rho}^{2}+1&\>-\rho&&\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr&\ddots&\ddots&\ddots&\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr&&-\rho&{\rho}^{2}+1&\>-\rho\\ \vskip 6.0pt plus 2.0pt minus 2.0pt\cr&&&-\rho&1\end{array}\right]\>.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

[2]\fnmMark \surGiesbrecht

1]\orgdivOntario Research Centre for Computer Algebra, School of Mathematical and Statistical Sciences, \orgnameUniversity of Western Ontario, \orgaddress\cityLondon, \stateON, \countryCanada

2]\orgdivCheriton School of Computer Science, \orgnameUniversity of Waterloo, \orgaddress\cityWaterloo, \stateON, \postcodeN2L 3G1, \countryCanada

3]\orgdivDepartment of Computer and Information Sciences, \orgnameUniversity of Delaware, \orgaddress\cityNewark, \stateDelaware, \countryUSA

On Parametric Linear System Solving

\fnmRobert M. \surCorless

[email protected]

\fnmLeili \surRafiee Sevyeri

[email protected]

\fnmB. David \surSaunders

[email protected]

[

Abstract

Parametric linear systems are linear systems of equations in which some symbolic parameters, that is, symbols that are not considered to be candidates for elimination or solution in the course of analyzing the problem, appear in the coefficients of the system. In this paper we assume that the symbolic parameters appear polynomially in the coefficients and that the only variables to be solved for are those of the linear system. The consistency of the system and expression of the solutions may vary depending on the values of the parameters. It is well-known that it is possible to specify a covering set of regimes, each of which is a Zariski-constructible condition on the parameters together with a solution description valid under that condition.

We provide a method of solution that requires time polynomial in the matrix dimension and the degrees of the polynomials when there are up to three parameters. We also discuss examples suggesting how the method may be useful beyond the formal three-parameter setting.

In previous methods the number of regimes needed is exponential in the system dimension and polynomial degree of the parameters. Our approach exploits the Hermite and Smith normal forms that may be computed when the system coefficient domain is mapped to the univariate polynomial domain over suitably constructed fields. Our method identifies intrinsic singularities and ramification points where the algebraic and geometric structure of the matrix changes.

Parametric eigenvalue problems are addressed as well: we treat $\lambda$ as a parameter in addition to those in $\mathbf{A}$ and solve the parametric system $(\lambda\mathbf{I}-\mathbf{A})\mathbf{u}=0$ . The algebraic conditions on $\lambda$ required for a nontrivial nullspace define the eigenvalues. We do not directly address the problem of computing the Jordan form, but our approach allows the construction of the algebraic and geometric eigenvalue multiplicities revealed by the Frobenius form, which is a key step in the construction of the Jordan form of a matrix.

1 Introduction

In broad generality, symbolic computation is concerned with mathematical equations that contain symbols; symbols are used both for variables, which are typically to be solved for, and parameters, which are typically carried through and appear in the solutions, which are then interpreted as formulae: that is, objects that can be further studied, perhaps by varying the parameters. One prominent early researcher said that the difference between symbolic and numeric computation was merely a matter of when numerical values were inserted into the parameters: before the computation meant you were going to do things numerically, and after the computation meant you had done symbolic computation. The words “parameters” and “variables” are therefore not precisely descriptive, and can often be used interchangeably. Indeed as a matter of practice, polynomial equations can often be taken to have one subset of its symbols taken as variables rather than any other subset in quite strategic fashion: it may be better to solve for $x$ as a function of $y$ than to solve for $y$ as a function of $x$ .

In this paper we are concerned with systems of equations containing several symbols, some of which we take to be variables, and all the rest as parameters. In addition, we restrict our attention to problems in which the variables appear only linearly. Parameters are allowed to appear polynomially, of whatever degree.

Parametric Linear Systems (PLS) arise in many contexts, for instance in the analysis of the stability of equilibria in dynamical systems models such as occur in mathematical biology and other areas. Understanding the different potential kinds of dynamical behavior can be important for model selection as well as analysis. Another important area of interest is the role of parametric linear systems in dealing with the stability of the equilibria of parametric autonomous systems of ordinary differential equations (see [26] and [12]). One particularly famous example is the Lotka-Volterra system which arises naturally from predator-prey equations. See also [24] and [25]. Other examples of the use of parametric linear system from science and engineering include their application in computing the characteristic solutions for differential equations [9], dealing with colored Petri nets [14] and in operations research and engineering [10], [18], [22], [32]. Some problems in robotics [2] and certain modelling problems in mathematical biology, see e.g., [30], also can benefit from the ability to effectively solve parametric linear systems.

After some discussion of prior comprehensive solving work in Section 2, we proceed with formal problem and solution definitions for parametric linear systems in Section 3. Our primary tool for solving these is by way of Comprehensive Triangular Smith Normal Form (CTSNF), which is introduced in Section 5, where we also reduce PLS to CTSNF. Section 6 describes the solution of CTSNF problems for the case of up to three parameters.

An application that seems at first to be of only theoretical interest is the computation of the matrix logarithm, or indeed any of several other matrix functions such as matrix square root. We briefly discuss this example in more detail with a pair of small matrices in Section 7.2. We also give other examples in Section 7.

A preliminary conference version of this work appeared in [5]. The present version expands the definitions, proofs, and treatment of constructible regimes.

2 Previous Work on Parameterized Linear Systems

Interest in computation of the solution of parameterized linear systems dates back to the beginning of symbolic computation. For instance, one of the first things users have requested of computer algebra systems is the explicit form of the inverse of a matrix containing only symbolic entries111This is merely an anecdote, but one of the present authors attests that this really has happened.: the user is then typically quite dissatisfied at the complexity of the answer if the dimension is greater than, say, three. Of course, the determinant itself, which must appear in such an answer, has a factorial number of terms in it, and thus growth in the size of the answer must be more than exponential. Therefore the complexity of any algorithm to solve parameterized linear systems must be at least exponential in the number of parameters.

An interesting pair of papers addressing the case of only one parameter is [1] and [16]. These papers assume full rank of the linear system, and thus compute the “generic” case when in fact there are isolated values of the parameter for which the rank drops, and use rational interpolation of the numerical solutions of specialized linear systems to recover this generic solution.

Many authors have sought comprehensive solutions, by which is meant complete coverage of all parametric regimes, through various means. One of the first explicit methods was the matrix-minor based approach of [26], which enables practical solution of many problems of interest. Recently, the problem of computing the Jordan form of a parametric matrix once the Frobenius form is known has been approached using Regular Chains [7], and this has been moderately successful in practice. Simple methods and heuristics for linear systems containing parameters continue to generate interest, even when Regular Chains are used, such as in [3].

Other authors such as [31], [19], [21], [17], and [20] have tackled the even more difficult problem of computing the comprehensive solution of systems of polynomial equations containing parameters, and of course their methods can be applied to the linear equations being considered here.

By restricting our attention in this paper to linear problems and to those of a constant number of parameters (e.g., three or fewer) we are able to guarantee better worst case performance (polynomially many solution regimes) and hope to provide better efficiency in many instances than is possible using those general-purpose approaches.

3 Definitions and Notation

Let ${\mathsf{F}}$ be a field and $Y=(y_{1},\ldots,y_{s})$ a list of parameters, and fix an algebraic closure $\overline{{\mathsf{F}}}$ of ${\mathsf{F}}$ . Then ${\mathsf{F}}[Y]$ is the ring of polynomials and ${\mathsf{F}}(Y)$ is the field of rational functions in $Y$ . For each tuple $a=(a_{1},\ldots,a_{s})$ in $\overline{{\mathsf{F}}}^{s}$ , evaluation at $a$ is a mapping ${\mathsf{F}}[Y]\rightarrow\overline{{\mathsf{F}}}$ . We will extend this mapping componentwise to polynomials, vectors, matrices, and sets thereof over ${\mathsf{F}}[Y]$ (i.e., in ${\mathsf{F}}[Y][x]={\mathsf{F}}[Y,x]$ ). Equivalently, one may interpret evaluation in arbitrary field extensions of ${\mathsf{F}}$ ; we fix $\overline{{\mathsf{F}}}$ for notational convenience.

In many of our constructions an additional symbol $x$ will play a special role: it is a parameter of the PLS problems but, for purposes of computing Hermite/Smith forms, we treat $x$ as the univariate polynomial variable over ${\mathsf{F}}(Y)$ . Accordingly, evaluation at $Y=a$ extends to a partial evaluation map ${\mathsf{F}}[Y,x]\rightarrow\overline{{\mathsf{F}}}[x]$ by substituting $Y=a$ and leaving $x$ unbound. For $f\in{\mathsf{F}}[Y,x]$ we write $f(a,x)$ for the resulting polynomial in $x$ , and for $\xi\in\overline{{\mathsf{F}}}$ we write $f(a,\xi)$ (equivalently $f(a,x)|_{x=\xi}$ ) for its value under the further specialization $x=\xi$ .

We use the Householder convention, typesetting matrices in upper case bold, e.g. $\mathbf{A}$ , and lower case bold for vectors, e.g. $\mathbf{b}$ .

For the most part, for such objects over ${\mathsf{F}}[Y,x]$ , we know $Y$ and $x$ from context and write $\mathbf{A}$ rather than $\mathbf{A}(Y,x)$ , but write $\mathbf{A}(a)\in\overline{{\mathsf{F}}}[x]^{m\times n}$ for the partial evaluation at $Y=a$ , and $\mathbf{A}(a,\xi)\in\overline{{\mathsf{F}}}^{m\times n}$ for the full evaluation at $(Y,x)=(a,\xi)$ .

For a set of polynomials $S\subseteq{\mathsf{F}}[Y]$ , we will denote by $V(S)$ the variety of the ideal generated by $S$ in $\overline{{\mathsf{F}}}^{s}$ . This is the set of tuples $a\in\overline{{\mathsf{F}}}^{s}$ such that $f(a)=0$ , for all $f\in S$ . For a polynomial $g\in{\mathsf{F}}[Y]$ let $D(g)$ denote its principal Zariski-open set, $D(g)=\{a\in\overline{{\mathsf{F}}}^{s}\mid g(a)\neq 0\}=\overline{{\mathsf{F}}}^{s}\setminus V(g)$ . For a finite set $N\subseteq{\mathsf{F}}[Y]$ define $D(N)=\bigcap_{n\in N}D(n)=D\big(\prod_{n\in N}n\big)$ .

When polynomial sets also involve $x$ , we use the same notation with respect to the extended parameter space $\overline{{\mathsf{F}}}^{s}\times\overline{{\mathsf{F}}}$ . For example, for $S\subseteq{\mathsf{F}}[Y,x]$ we write $V(S)$ for the set of pairs $(a,\xi)$ such that $f(a,\xi)=0$ for all $f\in S$ , and for $g\in{\mathsf{F}}[Y,x]$ we write $D(g)=\{(a,\xi)\mid g(a,\xi)\neq 0\}$ and $D(N)=D\big(\prod_{n\in N}n\big)$ for finite $N\subseteq{\mathsf{F}}[Y,x]$ . We will be concerned with basic Zariski-constructible (locally closed) sets of the form $V(Z)\cap D(N)$ . We use Zariski-constructible rather than “semi-algebraic” since we do not assume that ${\mathsf{F}}$ is an ordered field.

Our inputs are polynomial in the parameters but the output coefficients in general are rational functions. The evaluation mapping extends partially to ${\mathsf{F}}(Y,x)$ : for a rational function $r(Y,x)=n(Y,x)/d(Y,x)$ in lowest terms ( $n$ and $d$ relatively prime), define $\mathrm{den}(r)=d$ ; then $r(a,\xi)$ is well defined for $(a,\xi)\in D(\mathrm{den}(r))$ . We extend $\mathrm{den}(\cdot)$ componentwise to vectors and matrices over ${\mathsf{F}}(Y,x)$ (e.g., by taking the least common multiple of entry denominators). Throughout, whenever regime data contains rational function coefficients, we explicitly adjoin the relevant denominator factors to the nonvanishing constraint set $N$ so that all expressions are well defined on the corresponding constructible set.

When a condition is imposed on an element $q$ of a rational function field such as ${\mathsf{F}}_{Y}(x)$ , we use the following convention. After writing $q=\mathrm{num}(q)/\mathrm{den}(q)$ in lowest terms, the condition $q=0$ means $\mathrm{num}(q)=0$ together with $\mathrm{den}(q)\neq 0$ , and the condition $q\neq 0$ means $\mathrm{num}(q)\neq 0$ together with $\mathrm{den}(q)\neq 0$ . Thus writing $q$ in a vanishing set is shorthand for adjoining $\mathrm{num}(q)$ to $Z$ and the factors of $\mathrm{den}(q)$ to $N$ , while writing $q$ in a nonvanishing set is shorthand for adjoining the factors of both $\mathrm{num}(q)$ and $\mathrm{den}(q)$ to $N$ . The same convention is applied componentwise to vectors and matrices. For regimes over an algebraic parameterized extension, these conditions are interpreted in the corresponding quotient coordinate ring; equivalently, after choosing a basis for the quotient, one may clear denominators and equate the corresponding coordinate polynomials over the base parameter ring.

Definition 3.1.

For a polynomial $\Delta$ in a polynomial ring over ${\mathsf{F}}$ (e.g., in ${\mathsf{F}}[Y]$ or ${\mathsf{F}}[Y,x]$ ) we write $\mathrm{Fac}(\Delta)$ for the (finite) set of monic irreducible factors of $\Delta$ in that ring. For rational matrices/vectors with entries in ${\mathsf{F}}(Y,x)$ we write $\Delta(\mathbf{M}):=\mathrm{den}(\mathbf{M})$ and $\mathrm{Fac}(\mathbf{M}):=\mathrm{Fac}(\Delta(\mathbf{M}))$ .

Definition 3.2.

A PLS instance $(\mathbf{A},\mathbf{b},N,Z)$ is well defined if all entries of $\mathbf{A}$ and $\mathbf{b}$ are well defined on $V(Z)\cap D(N)$ ; equivalently, $V(Z)\cap D(N)\subseteq D(\mathrm{den}(\mathbf{A},\mathbf{b}))$ .

Definition 3.3.

The data for a PLS problem is a matrix $\mathbf{A}$ and right hand side vector $\mathbf{b}$ over ${\mathsf{F}}[Y,x]$ , together with a Zariski-constructible constraint, $V(Z)\cap D(N)$ , with $N,Z\subseteq{\mathsf{F}}[Y,x]$ . We are only interested in those parameter value tuples $(a,\xi)\in\overline{{\mathsf{F}}}^{s}\times\overline{{\mathsf{F}}}$ in $V(Z)\cap D(N)$ , i.e., on which the polynomials in $N$ are nonzero and the polynomials in $Z$ are zero.

For the PLS problem $(\mathbf{A},\mathbf{b},N,Z)$ :

•

a solution regime is a tuple $(\mathbf{u},\mathbf{B},N^{\prime},Z^{\prime})$ , with entries of $\mathbf{u}$ and $\mathbf{B}$ in ${\mathsf{F}}_{Y}(x)$ for some parameterized extension ${\mathsf{F}}_{Y}$ of ${\mathsf{F}}$ (Definition 5.1), such that all denominator factors occurring in $\mathbf{u}$ and $\mathbf{B}$ are contained in $N^{\prime}$ , and such that, for all $(a,\xi)\in V(Z^{\prime})\cap D(N^{\prime})$ , $\mathbf{u}(a,\xi)$ is a solution vector and $\mathbf{B}(a,\xi)$ is a matrix whose columns form a nullspace basis for $\mathbf{A}(a,\xi)$ .

•

An inconsistency regime is a triple $(\bot,N^{\prime},Z^{\prime})$ such that, for all $(a,\xi)\in V(Z^{\prime})\cap D(N^{\prime})$ , the specialized system $\mathbf{A}(a,\xi)\mathbf{u}=\mathbf{b}(a,\xi)$ has no solution. The $\bot$ symbol is to remind that no solution vector is on offer in this case.

•

A PLS solution is a set of regimes (solution regimes and inconsistency regimes) that covers $V(Z)\cap D(N)$ , which means every parameter value assignment that satisfies the problem Zariski-constructible constraint $N,Z$ also satisfies at least one regime Zariski-constructible constraint. In other words

[TABLE]

We call entries that must occur in any $Z$ in the solution an intrinsic restriction, or singularity. We call the differing sets $V(Z_{i})\cap D(N_{i})$ that may occur in covers of $V(Z)\cap D(N)$ the ramifications of the cover.

The following examples illustrate the PLS definition and also sketch the prior approach to PLS given by [26].

Example 3.4.

Let ${\mathsf{F}}$ be a field, $Y=(y)$ , and consider the PLS instance with

[TABLE]

and with empty initial constraints $N=Z=\emptyset$ . If $y\neq 0$ then the second equation reads $0=y$ and the system is inconsistent; thus we have inconsistency at the points $D(y)$ (i.e., we have inconsistency regimes $(\bot,\{y\},\emptyset)$ . On $V(y)\cap D(x)$ the matrix $\mathbf{A}$ has rank $1$ and the solution set is $\{(0,t)^{T}\mid t\in\overline{{\mathsf{F}}}\}$ , so one solution regime is $(\mathbf{u}=\mathbf{0},\,\mathbf{B}=[0,1]^{T},\,N^{\prime}=\{x\},\,Z^{\prime}=\{y\})$ . On $V(y,x)$ the matrix $\mathbf{A}$ is zero and every vector solves; one may take $(\mathbf{u}=\mathbf{0},\,\mathbf{B}=\mathbf{I}_{2},\,N^{\prime}=\emptyset,\,Z^{\prime}=\{y,x\})$ .

We now sketch the minor-based regime construction from [26]. If, for $\mathbf{M}$ of size $r\times r~$ , $\mathbf{A}$ is $\begin{bmatrix}\mathbf{M}&\mathbf{B}\\ \mathbf{C}&\mathbf{D}\end{bmatrix}$ , and conformally $\mathbf{b}=\begin{bmatrix}\mathbf{c}&\mathbf{d}\end{bmatrix}^{T}$ , then a solution $\mathbf{u}=\begin{bmatrix}\mathbf{v}&\mathbf{w}\end{bmatrix}^{T}$ satisfies

[TABLE]

and

[TABLE]

Under the condition that $\det(\mathbf{M})$ is nonzero and all larger minors of $\mathbf{A}$ are zero, equation (3.1) can be solved with specific solution $\mathbf{w}=0$ and $\mathbf{v}=\mathbf{M}^{-1}\mathbf{c}$ . Provided the system is consistent (equation (3.2) holds), we have the regime

[TABLE]

where $N=\{\det(\mathbf{M})\}$ and $Z=\{\mbox{all }(r+1)\times(r+1)\mbox{ minors of }\mathbf{A}\}$ .

Definition 3.5.

A solution regime obtained in this way (from a choice of nonsingular $r\times r$ submatrix $\mathbf{M}$ ) is called a minor-defined regime.

Since an $n\times n$ matrix has $\sum_{k=0}^{n}\binom{n}{k}^{2}=\binom{2n}{n}$ minors, there are exponentially many minor defined regimes. However, some of these regimes may not be solutions due to inconsistency or it may be possible to combine several regimes into one. For instance if $\det(\mathbf{M})$ is a constant, and $\mathbf{b}=0$ , then all rank $r$ solutions are covered by this one regime. [26] has made a thorough study of minor defined regimes and their simplifications.

Another approach is to base solution regimes on the pivot choices in an LU decomposition. The simplest thing to do is to leave it to the user, although one has to also inform the user through a proviso when this might be necessary [6]. That is, provide the generic answer, but also provide a description of the set $N$ . A more sophisticated approach is developed by [7, 3] using the theory of regular chains and its implementation in Maple [19] to manage the algebraic conditions. For example a given matrix entry may be used as a pivot, with validity dependent on adding the polynomial to the non-zero part, $N$ , of the Zariski-constructible set. For a comprehensive solution the case that that entry is zero must also be pursued. In the worst case, this leads to a tree of zero/nonzero choices of depth $n$ and branching factor $n$ .

4 Triangular Smith forms and degree bounds

In this paper we take a different approach, with the solution regimes arising from Hermite normal forms, of which triangular Smith forms are a special case. We give a system of solution regimes of polynomial size in the matrix dimension, $n$ , and polynomial degree, $d$ . Each regime is computed in polynomial time and the regime count is exponential only in the number of parameters. To use Hermite forms we will need to work over a principal ideal domain such as, for parameters $x,y$ , ${\mathsf{F}}(y)[x]$ . We will restrict our input matrix to be polynomial in the parameters. This first lemma shows it is not a severe constraint.

Lemma 4.1.

Let $(\mathbf{A},\mathbf{b},N,Z)$ be a well defined PLS over field ${\mathsf{F}}(Y)$ , for parameter set $Y$ , with $\mathbf{A}\in{\mathsf{F}}(Y)^{m\times n}$ and $\mathbf{b}\in{\mathsf{F}}(Y)^{m}$ with numerator and denominator degrees bounded by $d$ in each parameter of $Y$ and well defined in the sense of Definition 3.2. The problem is equivalent (same solutions) to one in which the entries of the matrix and vector are polynomial in the parameters $Y$ , the dimension is the same, and the degrees are bounded by $(n+1)d$ .

Proof.

Because the PLS is well defined, it is specified by $N$ that all denominator factors of $\mathbf{A}(a),\mathbf{b}(a)$ are nonzero for $a\in V(Z)\cap D(N)$ . Let $\mathbf{L}$ be a diagonal matrix with the $i$ -th diagonal entry being the least common multiple ( $\mathrm{lcm}$ ) of the denominators in row $i$ of $\mathbf{A}$ , $\mathbf{b}$ . These $\mathrm{lcm}$ s also evaluate to nonzero on $V(Z)\cap D(N)$ . It follows that $L(a)\mathbf{A}(a)\mathbf{u}(a)=\mathbf{L}(a)\mathbf{b}(a)$ if and only if $\mathbf{A}(a)\mathbf{u}(a)=\mathbf{b}(a)$ . Thus the PLS $(\mathbf{L}\mathbf{A},\mathbf{L}\mathbf{b},N,Z)$ is equivalent and its matrix and vector have polynomial entries of degrees bounded by $(n+1)d$ . ∎

We will reduce PLS to triangular Smith normal form computations. The rest of this section concerns computation of triangular Smith normal form and bounds for the degrees of the form and its unimodular cofactor.

Definition 4.2.

Given field ${\mathsf{K}}$ and variable $x$ , a matrix $\mathbf{H}$ over ${\mathsf{K}}[x]$ is in (reduced) Hermite normal form if it is upper triangular, its diagonal entries are monic, and, for each column in which the diagonal entry is nonzero, the off-diagonal entries are of lower degree than the diagonal entry.

If each diagonal entry of $\mathbf{H}$ exactly divides all those below and to the right, then $\mathbf{H}$ is column equivalent to a diagonal matrix with the same diagonal entries (its Smith normal form). An equivalent condition is that, for each $i$ , the greatest common divisor of the $i\times i$ minors in the leading $i$ columns equals the greatest common divisor of all $i\times i$ minors. Following [28, Section 8, Definition 8.2] we call such a Hermite normal form a triangular Smith normal form. It will be the central tool in our PLS solution.

For notational simplicity, we’ve left out the possibility of echelon structure in a Hermite normal form. We will talk of Hermite normal forms only for matrices having leading columns independent up to the rank of the matrix. In our algorithms we assume this hypothesis (or enforce it by a generic right multiplication by a constant nonsingular matrix, as in Fact 4.4). Every such matrix over ${\mathsf{K}}[x]$ is row equivalent to a unique matrix in Hermite form as defined above. For given $\mathbf{A}$ we have $\mathbf{U}\mathbf{A}=\mathbf{H}$ , with $\mathbf{U}$ unimodular, i.e. $\det(\mathbf{U})\in{\mathsf{K}}^{*}$ , and $\mathbf{H}$ in Hermite form. If $\mathbf{A}$ is nonsingular, the unimodular cofactor $\mathbf{U}$ is unique and has determinant $1/c$ , where $c$ is the leading coefficient of $\det(\mathbf{A})$ . This follows since $\det(\mathbf{U})\det(\mathbf{A})=\det(\mathbf{H})$ , which is monic.

The next definition and lemma concern assurance that Hermite form computation will yield a triangular Smith form.

Definition 4.3.

Call a matrix well-tempered if its Hermite form is a triangular Smith form (each diagonal entry exactly divides those below and to the right). In particular, a well-tempered matrix has leading columns independent up to the rank.

There is always a column transform (unimodular matrix $\mathbf{R}$ applied from the right) such that $\mathbf{A}\mathbf{R}$ is well-tempered. The following fact, proven by [15] shows that a random transform over ${\mathsf{F}}$ suffices with high probability.

Fact 4.4.

Let $\mathbf{A}$ be a $m\times n$ matrix over ${\mathsf{K}}[x]$ of degree in $x$ at most $d$ . Let $\mathbf{R}$ be a unit lower triangular matrix with below diagonal elements chosen from subset $S$ of ${\mathsf{K}}$ uniformly at random. Then $\mathbf{A}\mathbf{R}$ is well-tempered over ${\mathsf{K}}[x]$ with probability at least $1-4n^{3}d/|S|$ .

Note that $\deg_{x}(\mathbf{A}\mathbf{R})=\deg_{x}(\mathbf{A})$ and, for ${\mathsf{K}}={\mathsf{F}}(y),\mathbf{A}\in{\mathsf{F}}[y,x]^{m\times n}$ and $S\subseteq{\mathsf{F}}$ we also have $\deg_{y}(\mathbf{A}\mathbf{R})=\deg_{y}(\mathbf{A})$ .

We continue with analysis of degree bounds for Hermite forms of matrices, particularly degree bounds for triangular Smith forms of well-tempered matrices. The first result needed is the following fact from [11]. Through the remainder of this paper we will employ “soft O” notation, where, for functions $f,g\in\mathbb{R}^{k}\to\mathbb{R}$ we write $f={O\,\tilde{}\,}(g)$ if and only if $f=O(g\cdot\log^{c}|g|)$ for some constant $c>0$ .

Fact 4.5.

Let ${\mathsf{F}}$ be a field, $x,y$ parameters, and let $\mathbf{A}$ be in ${\mathsf{F}}[y,x]^{n\times n}$ , nonsingular, with $\deg_{x}(\mathbf{A})\leq d$ , $\deg_{y}(\mathbf{A})\leq e$ . Over ${\mathsf{F}}(y)[x]$ , let $\mathbf{H}$ be the unique Hermite form row equivalent to $\mathbf{A}$ and $\mathbf{U}$ be the unique unimodular cofactor such that $\mathbf{U}\mathbf{A}=\mathbf{H}$ . The coefficients of the entries of $\mathbf{H}$ , $\mathbf{U}$ are rational functions of $y$ . Let $\Delta$ be the least common multiple of the denominators of the coefficients in $\mathbf{H}$ , $\mathbf{U}$ , as expressed in lowest terms.

(a)

$\deg_{x}(\mathbf{U})\leq(n-1)d$ * and $\deg_{x}(\mathbf{H})\leq nd.$ * 2. (b)

$\deg_{y}(\mathrm{num}(\mathbf{H})),\deg_{y}(\mathrm{num}(\mathbf{U}))\leq n^{2}de$ * (bounds both numerator and denominator degrees).* 3. (c)

$\deg_{y}(\Delta)\leq n^{2}de$ . 4. (d)

$\mathbf{H}$ * and $\mathbf{U}$ can be computed in polynomial time: deterministically in ${O\,\tilde{}\,}(n^{9}d^{4}e)$ time and Las Vegas probabilistically (never returns incorrect result) in ${O\,\tilde{}\,}(n^{7}d^{3}e)$ expected time.*

Proof.

This is [11, Summary Theorem]. The situation there is more abstract, more involved. We offer this tip to the reader: their $\partial,z,\sigma,\delta$ correspond respectively to our $x,y$ , identity, zero.

Item (c) is not stated explicitly in a theorem of [11] but is evident from the proofs of Theorems 5.2 and 5.6 there. The common denominator is the determinant of a matrix over ${\mathsf{K}}[z]$ of dimension $n^{2}d$ and with entries of degree in $z$ at most $e$ . ∎

We will generalize this fact to nonsingular and non-square matrices in Theorem 4.6. In that case the unimodular cofactor, $\mathbf{U}$ , is not unique and may have arbitrarily large degree entries. The following algorithm is designed to produce a $\mathbf{U}$ with bounded degrees.

Theorem 4.6.

Let ${\mathsf{F}}$ be a field, $x,y$ parameters, and let $\mathbf{A}$ be in ${\mathsf{F}}[y,x]^{m\times n}$ of rank $r$ , $\deg_{x}(\mathbf{A})\leq d$ , and $\deg_{y}(\mathbf{A})\leq e$ . Let $\mathbf{R}$ be a random unit lower triangular matrix chosen as in Fact 4.4, so that $\mathbf{AR}$ is well-tempered with the stated probability. Then, for the triangular Smith form $\mathbf{U}\mathbf{A}\mathbf{R}=\mathbf{H}$ computed as $\mathbf{U}$ , $\mathbf{H}$ = HermiteForm( $\mathbf{A}\mathbf{R}$ ), we have

(a)

Algorithm HermiteForm is (Las Vegas) correct and runs in expected time $\mathrm{O}(m^{7}d^{3}e)$ ; 2. (b)

$\deg_{x}(\mathbf{U},\mathbf{H})\leq md$ ; 3. (c)

$\deg_{y}(\mathbf{U},\mathbf{H})={O\,\tilde{}\,}(m^{2}de)$ .

Proof.

Let $\mathbf{R}$ be as in Fact 4.4 with ${\mathsf{K}}={\mathsf{F}}(y)$ and $S\subseteq{\mathsf{F}}$ . If the field ${\mathsf{F}}$ is small, an extension field can be used to provide large enough $S$ .

We apply HermiteForm to $\mathbf{A}\mathbf{R}$ to obtain $\mathbf{U}$ , $\mathbf{H}$ , and use the notation of Algorithm 1 in this proof. We see by construction that $\mathbf{B}$ is nonsingular, from which it follows that $\mathbf{U}_{1}$ and $\mathbf{T}$ are uniquely determined (since $\mathbf{T}$ is the unique Hermite form of $\mathbf{B}$ ).

By construction the first $r$ columns of $\mathbf{B}$ equal the first $r$ columns of $\mathbf{U}_{0}\mathbf{A}\mathbf{R}$ , hence the first $r$ columns of $\mathbf{T}=\mathbf{U}_{1}\mathbf{B}$ equal the first $r$ columns of $\mathbf{H}:=\mathbf{U}_{1}\mathbf{U}_{0}\mathbf{A}\mathbf{R}$ . Since $\mathbf{T}$ is upper triangular, all entries of $\mathbf{T}$ below row $r$ in its first $r$ columns are zero, and therefore the same holds for $\mathbf{H}$ . Because $\mathrm{rank}(\mathbf{A}\mathbf{R})=r$ and the leading $r\times r$ block $\mathbf{H}_{1}$ is nonsingular, the top $r$ rows of $\mathbf{H}$ are linearly independent and span the row space of $\mathbf{H}$ . Any linear combination of these rows that is zero in the first $r$ columns must therefore be trivial (since the restriction to the first $r$ columns has nonsingular coefficient matrix $\mathbf{H}_{1}$ ). It follows that the last $m-r$ rows of $\mathbf{H}$ are zero, so $\mathbf{H}$ has the block form $\begin{bmatrix}\mathbf{H}_{1}&\mathbf{H}_{2}\\ 0&0\end{bmatrix}$ .

Moreover, the first $r$ columns of $\mathbf{H}$ coincide with those of the Hermite form $\mathbf{T}$ of $\mathbf{B}$ , so they satisfy the reduced degree conditions required in Hermite normal form. Thus $\mathbf{H}$ is in Hermite normal form and is row equivalent to $\mathbf{A}\mathbf{R}$ . By uniqueness of Hermite normal form for matrices whose leading columns are independent up to rank, $\mathbf{H}$ is the Hermite form of $\mathbf{A}\mathbf{R}$ . Since $\mathbf{A}\mathbf{R}$ is well-tempered (Fact 4.4), this Hermite form is a triangular Smith form, as required. The runtime is dominated by computation of $\mathbf{U}_{1}$ and $\mathbf{T}$ for $\mathbf{B}$ , so Fact 4.5 provides the bound in (a).

For the degree in $x$ , applying Fact 4.5, we have $\deg_{x}(\mathbf{U}_{1})\leq(m-1)d$ . Noting that $\mathbf{U}_{0}$ has degree zero, we have $\deg_{x}(\mathbf{U})=\deg_{x}(\mathbf{U}_{1})$ and $\deg_{x}(\mathbf{H})=\deg_{x}(\mathbf{U})+\deg_{x}(\mathbf{A})\leq(m-1)d+d=md$ .

For the degree in $y$ , note first that the bounds $d,e$ for degrees in $\mathbf{A}$ apply as well to $\mathbf{B}$ . We have, by Fact 4.5, that $\deg_{y}(\mathrm{num}(\mathbf{U}_{1}))={O\,\tilde{}\,}(m^{2}de)$ and the same bound for $\deg_{y}(\mathrm{den}(\mathbf{U}_{1}))$ . For $\mathbf{H}$ , note that $\mathrm{num}(\mathbf{H})/\mathrm{den}(\mathbf{H})=\mathrm{num}(\mathbf{U})\mathbf{A}/\mathrm{den}(\mathbf{U})$ so that $\deg_{y}(\mathrm{den}(\mathbf{H}))\leq\deg_{y}(\mathbf{U})={O\,\tilde{}\,}(m^{2}de)$ , and $\deg_{y}(\mathrm{num}(\mathbf{H}))\leq\deg_{y}(\mathrm{num}(\mathbf{U})A)={O\,\tilde{}\,}(m^{2}de)+e={O\,\tilde{}\,}(m^{2}de)$ . ∎

5 Reduction of PLS to triangular Smith forms

In this section we define the Comprehensive Triangular Smith Normal form problem and solution and show that PLS can be reduced to it. The next section addresses the solution of CTSNF itself.

Definition 5.1.

For field ${\mathsf{F}}$ and parameters $Y=(y_{1},\ldots,y_{s})$ , ${\mathsf{F}}_{Y}$ is a parameterized extension of ${\mathsf{F}}$ if ${\mathsf{F}}_{Y}$ is the top of a tower obtained by adjoining the parameters of $Y$ in some order, each either as a rational function parameter or as algebraic over the field constructed so far. Thus, at an algebraic step adjoining $y_{i}$ , the field is extended by ${\mathsf{F}}_{j-1}[y_{i}]/\langle f_{i}\rangle$ , where $f_{i}$ is irreducible over ${\mathsf{F}}_{j-1}$ as a polynomial in $y_{i}$ . When a solution regime to a PLS or CTSNF problem is over a parameterized extension ${\mathsf{F}}_{Y}$ , we record the irreducible polynomials defining the algebraic steps of the tower in the vanishing constraint set $Z^{\prime}$ of that regime.

A comprehensive triangular Smith normal form problem (CTSNF problem) is a triple $(\mathbf{A},N,Z)$ of a matrix $\mathbf{A}$ over ${\mathsf{F}}[Y,x]$ together with polynomial sets $N,Z\subseteq{\mathsf{F}}[Y]$ defining a Zariski-constructible constraint on the parameters $Y$ . Here $x$ is treated as a distinguished variable: in CTSNF regimes we do not specialize $x$ , and evaluation at $a\in V(Z)\cap D(N)$ refers to the partial evaluation $\mathbf{A}(a)\in{\mathsf{F}}[x]^{m\times n}$ .

For CTSNF problem $(\mathbf{A},N,Z)$ , a triangular Smith regime is of the form $(\mathbf{U},\mathbf{H},\mathbf{R},N^{\prime},Z^{\prime})$ , with $\mathbf{U}$ , $\mathbf{H}$ over ${\mathsf{F}}_{Y}[x]$ , where ${\mathsf{F}}_{Y}$ is a parameterized extension of ${\mathsf{F}}$ and any polynomials defining algebraic extensions in the tower are in $Z^{\prime}$ , and with $N^{\prime},Z^{\prime}\subseteq{\mathsf{F}}[Y]$ , such that all denominator factors occurring in $\mathbf{U}$ and $\mathbf{H}$ are contained in $N^{\prime}$ , and on all $a\in V(Z^{\prime})\cap D(N^{\prime})$ , $\mathbf{H}(a)$ is in triangular Smith form over ${\mathsf{F}}(a)[x]$ , $\mathbf{U}(a)$ is unimodular in $x$ , $\mathbf{R}$ is nonsingular over ${\mathsf{F}}$ , and $\mathbf{U}(a)\mathbf{A}(a)\mathbf{R}=\mathbf{H}(a)$ .

A CTSNF solution is a list $\{(\mathbf{U}_{i},\mathbf{H}_{i},\mathbf{R}_{i},N_{i},Z_{i})|i\in 1,\ldots,k\}$ , of triangular Smith regimes that cover $V(Z)\cap D(N)$ , which is to say $V(Z)\cap D(N)\subseteq\cup\{V(Z_{i})\cap D(N_{i})|i\in 1,\ldots,k\}$ .

The goal in this section is to reduce the PLS problem to the CTSNF problem. The first step is to show it suffices to consider PLS with a matrix already in triangular Smith form. The second step is to show each CTSNF solution regime generates a set of PLS solution regimes.

Lemma 5.2.

Given a parameterized field ${\mathsf{F}}_{Y}$ and matrix $\mathbf{A}$ over ${\mathsf{F}}[Y,x]$ , let $\mathbf{H}$ be a triangular Smith form of $\mathbf{A}$ over ${\mathsf{F}}_{Y}[x]$ , with $\mathbf{U}$ unimodular over ${\mathsf{F}}_{Y}[x]$ , and $\mathbf{R}$ nonsingular over ${\mathsf{F}}$ such that $\mathbf{U}\mathbf{A}\mathbf{R}$ = $\mathbf{H}$ . Let $N^{\prime},Z^{\prime}$ be constraints such that all data below are defined and $\mathbf{U}(a)$ is unimodular for every $a\in V(Z^{\prime})\cap D(N^{\prime})$ . Then $(\mathbf{u},\mathbf{B},N^{\prime},Z^{\prime})$ is a solution regime for PLS problem $(\mathbf{A},\mathbf{b},N,Z)$ over ${\mathsf{F}}[Y,x]$ if and only if $(\mathbf{R}^{-1}\mathbf{u},\mathbf{R}^{-1}\mathbf{B},N^{\prime},Z^{\prime})$ is a solution regime for PLS problem $(\mathbf{H}$ , $\mathbf{U}\mathbf{b},N,Z)$ . Moreover, $(N^{\prime},Z^{\prime})$ is an inconsistency regime for one system if and only if it is an inconsistency regime for the other.

Proof.

Let $(a,\xi)\in V(Z^{\prime})\cap D(N^{\prime})$ . By hypothesis, all expressions appearing below are well defined and $\mathbf{U}(a)$ is unimodular over ${\mathsf{F}}(a)[x]$ . Hence $\mathbf{U}(a,\xi)$ is invertible. The matrix $\mathbf{R}$ is constant and nonsingular over ${\mathsf{F}}$ . Thus the following are equivalent.

$\mathbf{A}(a,\xi)\mathbf{u}(a,\xi)=\mathbf{b}(a,\xi).$ 2. 2.

$\mathbf{U}(a,\xi)\mathbf{A}(a,\xi)\mathbf{u}(a,\xi)=\mathbf{U}(a,\xi)\mathbf{b}(a,\xi).$ 3. 3.

$(\mathbf{U}(a,\xi)\mathbf{A}(a,\xi)\mathbf{R})(\mathbf{R}^{-1}\mathbf{u}(a,\xi))=\mathbf{U}(a,\xi)\mathbf{b}(a,\xi).$

The same equivalence shows that $\mathbf{A}(a,\xi)\mathbf{u}=\mathbf{b}(a,\xi)$ is consistent if and only if $\mathbf{H}(a,\xi)\mathbf{w}=\mathbf{U}(a,\xi)\mathbf{b}(a,\xi)$ is consistent, so inconsistency regimes are preserved as well. ∎

Then we have the following algorithm to solve a PLS with the matrix already in triangular Smith form. For simplicity we assume a square matrix, the rectangular case being a straightforward extension.

Lemma 5.3.

Algorithm TriangularSmithPLS is correct: it outputs a list of solution regimes and inconsistency regimes whose constructible sets cover $V(Z)\cap D(N)$ . Let $r_{0}=\max\{i\mid s_{i}\neq 0\}$ where $s_{i}$ are the diagonal entries of $\mathbf{H}$ (so $\mathbf{H}$ has rank $r_{0}$ over ${\mathsf{F}}_{Y}(x)$ ), and let $d=\sum_{i=1}^{r_{0}}\deg_{x}(s_{i})$ . Then the algorithm outputs at most $1+\sqrt{2d}$ solution regimes, and at most $(n+1)(1+\sqrt{2d})$ regimes total.

Proof.

For $r\in\mathcal{I}$ , the algorithm outputs a solution regime $(\mathbf{u},\mathbf{B},N_{r},Z_{r})$ . Using the numerator/denominator convention of Section 3, $s_{r}\in N_{r}$ imposes $s_{r}\neq 0$ and $f_{r}\in Z_{r}$ imposes $f_{r}=0$ wherever the relevant rational functions are defined. Thus for any $(a,\xi)\in V(Z_{r})\cap D(N_{r})$ we have $s_{r}(a,\xi)\neq 0$ . If $r<r_{0}$ then $f_{r}=\mathrm{sqfr}(s_{r+1})/\mathrm{sqfr}(s_{r})$ , so $f_{r}(a,\xi)=0$ implies $s_{r+1}(a,\xi)=0$ while $\mathrm{sqfr}(s_{r})(a,\xi)\neq 0$ , hence $s_{r}(a,\xi)\neq 0$ . If $r=r_{0}$ then $s_{r+1}$ is identically zero, so $s_{r+1}(a,\xi)=0$ holds automatically. In either case, the divisibility properties of triangular Smith form imply that all entries in rows $r+1,\ldots,n$ of $\mathbf{H}(a,\xi)$ are zero.

Since $s_{1}\mid s_{2}\mid\cdots\mid s_{r}$ in a triangular Smith form, the conditions above also imply $s_{1}(a,\xi),\ldots,s_{r}(a,\xi)\neq 0$ . Therefore the leading block $\mathbf{H}_{r}(a,\xi)$ is invertible. The regime adjoins $b_{r+1},\ldots,b_{n}$ to $Z_{r}$ , so for $(a,\xi)\in V(Z_{r})\cap D(N_{r})$ the last $n-r$ equations reduce to $0=b_{r+1}(a,\xi),\ldots,0=b_{n}(a,\xi)$ and are consistent. The vector $\mathbf{u}(a,\xi)$ satisfies the leading $r$ equations by construction, and the additional condition $\mathrm{den}(\mathbf{u})(a,\xi)\neq 0$ (enforced by $\mathrm{den}(\mathbf{u})\in N_{r}$ ) guarantees that $\mathbf{u}$ is well defined.

For the nullspace, write $\mathbf{H}=\begin{bmatrix}\mathbf{H}_{r}&\mathbf{H}_{12}\\ 0&0\end{bmatrix}$ after evaluation at $(a,\xi)$ . Then

[TABLE]

and the bottom block $\mathbf{I}_{n-r}$ shows that the columns of $\mathbf{B}(a,\xi)$ are linearly independent. The additional condition $\mathrm{den}(\mathbf{B})(a,\xi)\neq 0$ guarantees that $\mathbf{B}(a,\xi)$ is well defined. Thus $(\mathbf{u},\mathbf{B},N_{r},Z_{r})$ is a correct solution regime.

The algorithm also outputs inconsistency regimes $(\bot,N_{r,j},Z_{r,j})$ for $j=r+1,\ldots,n$ . For any $(a,\xi)\in V(Z_{r,j})\cap D(N_{r,j})$ we have the same rank conditions as above (since $s_{r}\in N_{r,j}$ and $f_{r}\in Z_{r,j}$ ), so the last $n-r$ rows of $\mathbf{H}(a,\xi)$ are zero, while $b_{j}(a,\xi)\neq 0$ (since $b_{j}\in N_{r,j}$ ). Hence $\mathbf{H}(a,\xi)\mathbf{u}=\mathbf{b}(a,\xi)$ is inconsistent, so $(\bot,N_{r,j},Z_{r,j})$ is a correct inconsistency regime.

To see that these regimes cover $V(Z)\cap D(N)$ , take $(a,\xi)\in V(Z)\cap D(N)$ and let $r=\mathrm{rank}(\mathbf{H}(a,\xi))$ . By the divisibility chain on the diagonal entries, $s_{r}(a,\xi)\neq 0$ and $s_{r+1}(a,\xi)=0$ , which implies either $r=r_{0}$ or $f_{r}(a,\xi)=0$ . Thus $r\in\mathcal{I}$ and the algorithm generates the corresponding regimes. If $\mathbf{H}(a,\xi)\mathbf{u}=\mathbf{b}(a,\xi)$ is consistent, then necessarily $b_{r+1}(a,\xi)=\cdots=b_{n}(a,\xi)=0$ , so $(a,\xi)\in V(Z_{r})\cap D(N_{r})$ and the solution regime applies. If it is inconsistent, then there exists $j>r$ with $b_{j}(a,\xi)\neq 0$ , so $(a,\xi)\in V(Z_{r,j})\cap D(N_{r,j})$ and an inconsistency regime applies.

For the regime count, let $r_{1}<\cdots<r_{t}$ be the indices with $\deg_{x}(f_{r_{j}})>0$ . Each $f_{r_{j}}$ has a square-free irreducible factor of degree at least $1$ that divides $s_{r_{j}+1},\ldots,s_{r_{0}}$ , hence contributes at least $r_{0}-r_{j}$ to $d=\sum_{i=1}^{r_{0}}\deg_{x}(s_{i})$ . Therefore

[TABLE]

so $t\leq\sqrt{2d}$ . Since the algorithm outputs at most one additional solution regime coming from $f_{r_{0}}=0$ , it outputs at most $t+1\leq 1+\sqrt{2d}$ solution regimes. Each such $r$ produces at most $n-r$ inconsistency regimes, so the total number of regimes is at most $(n+1)(t+1)\leq(n+1)(1+\sqrt{2d})$ . ∎

Theorem 5.4.

Algorithm 3 is correct.

Proof.

Let $(a,\xi)\in V(Z)\cap D(N)$ . By construction $(a,\xi)$ satisfies the $Y$ -only constraints $V(Z_{Y})\cap D(N_{Y})$ , so at least one triangular Smith regime of $T$ in step $2$ is valid at $a$ . For that regime, step $3$ applies Algorithm 2 and produces either a solution regime or an inconsistency regime whose constructible set contains $(a,\xi)$ (Lemma 5.3). Lemma 5.2 transfers solution data back through $\mathbf{R}$ without changing the underlying constraint set, so the corresponding regime for the original PLS problem is correct and covers $(a,\xi)$ . ∎

6 Solving Comprehensive Triangular Smith Normal Form

In view of the reductions of the preceding section, to solve a parametric linear system it remains only to solve a comprehensive triangular Smith form problem. This is difficult in general but we give a method to give a comprehensive solution with polynomially many regimes in the bivariate and trivariate cases.

Theorem 6.1.

Let $\rho=\max(m,n)$ , and let $\mathbf{A}\in{\mathsf{F}}[y,x]^{m\times n}$ have degree $d$ in $x$ and degree $e$ in $y$ . Let $N,Z$ be polynomial sets defining a Zariski-constructible constraint on $y$ . Then the CTSNF problem $(\mathbf{A},N,Z)$ has a solution of at most ${O\,\tilde{}\,}(\rho^{2}de)$ triangular Smith regimes.

Proof.

We first solve the unconstrained case $N=Z=\emptyset$ . Compute a triangular Smith form $\mathbf{U}_{0},\mathbf{H}_{0},\mathbf{R}_{0}$ over ${\mathsf{F}}(y)[x]$ such that $\mathbf{U}_{0}\mathbf{A}\mathbf{R}_{0}=\mathbf{H}_{0}$ . This regime is valid for evaluations that do not zero the denominators (polynomials in $y$ ) of $\mathbf{H}_{0}$ and $\mathbf{U}_{0}$ . Let $\Delta_{0}=\mathrm{den}(\mathbf{U}_{0},\mathbf{H}_{0})$ and set $N_{0}=\mathrm{Fac}(\Delta_{0})$ , with $Z_{0}=\emptyset$ , to complete the generic regime.

Then, for each $f\in N_{0}$ , adjoin the regime $(\mathbf{U}_{f},\mathbf{H}_{f},\mathbf{R}_{f},N_{f}=N_{0}\setminus\{f\},Z_{f}=\{f\})$ , obtained by computing the triangular Smith form over $({\mathsf{F}}[y]/\langle f\rangle)[x]$ . Together with the generic regime indexed by [math], these regimes cover all specializations of $y$ : at any point either no factor of $\Delta_{0}$ vanishes, or at least one irreducible factor $f$ of $\Delta_{0}$ vanishes. From the bounds of Theorem 4.6, $\deg_{y}(\Delta_{0})={O\,\tilde{}\,}(\rho^{2}de)$ , so the number of irreducible factors, and hence the number of regimes, is ${O\,\tilde{}\,}(\rho^{2}de)$ .

For general input constraints $N,Z$ , intersect each produced regime with $V(Z)\cap D(N)$ , i.e., replace each output $(\mathbf{U}_{*},\mathbf{H}_{*},\mathbf{R}_{*},N_{*},Z_{*})$ by $(\mathbf{U}_{*},\mathbf{H}_{*},\mathbf{R}_{*},N\cup N_{*},Z\cup Z_{*})$ . This changes neither the validity of the regimes nor their number. ∎

We can proceed in a similar way when there are three parameters, but must address an additional complication that arises.

Before proving the trivariate bound, we use the following degree-bound observation. The denominator degree bounds used in Theorem 4.6 apply, with the same proof, when the single coefficient parameter $y$ is replaced by a pair of coefficient parameters $y,z$ . In particular, for $\mathbf{A}\in{\mathsf{F}}[y,z,x]^{m\times n}$ with degree at most $d$ in $x$ and at most $e$ in each of $y,z$ , the common denominator of the computed generic triangular Smith regime over ${\mathsf{F}}(y,z)[x]$ has degree ${O\,\tilde{}\,}(\rho^{2}de)$ in each of $y$ and $z$ . The same bound applies to the one-parameter denominators arising after restricting to an irreducible curve factor and computing over the corresponding parameterized coefficient field. This follows from the determinant expressions controlling the common denominators in Fact 4.5 and Theorem 4.6, with the remaining coefficient parameters carried through in the coefficient field.

Theorem 6.2.

Let $\rho=\max(m,n)$ , and let $\mathbf{A}\in{\mathsf{F}}[z,y,x]^{m\times n}$ have degree $d$ in $x$ and degree $e$ in $y,z$ . Let $N,Z$ be polynomial sets defining a Zariski-constructible constraint on $y$ and $z$ . Then the CTSNF problem $(\mathbf{A},N,Z)$ has a solution of at most ${O\,\tilde{}\,}(\rho^{4}d^{2}e^{2})$ triangular Smith regimes.

Proof.

As in the bivariate case above, we solve the unconstrained case and just adjoin $N,Z$ , if nontrivial, to the Zariski-constructible condition of each solution regime.

First compute a triangular Smith form $\mathbf{U}_{0},\mathbf{H}_{0},\mathbf{R}_{0}$ over ${\mathsf{F}}(y,z)[x]$ such that $\mathbf{U}_{0}\mathbf{A}\mathbf{R}_{0}=\mathbf{H}_{0}$ . This will be valid for evaluations that don’t zero the denominators (polynomials in $y,z$ ) of $\mathbf{H}_{0}$ and $\mathbf{U}_{0}$ . Let $\Delta_{0}=\mathrm{den}(\mathbf{U}_{0},\mathbf{H}_{0})$ and set $N_{0}=\mathrm{Fac}(\Delta_{0})$ , and set $Z_{0}=\emptyset$ to complete the first regime.

Next, for each $f\in N_{0}$ , if $y$ occurs in $f$ compute a triangular Smith form $(\mathbf{U}_{f},\mathbf{H}_{f},\mathbf{R}_{f})$ over $({\mathsf{F}}(z)[y]/\langle f\rangle)[x]$ . If $y$ does not occur in $f$ , interchange the roles of $y,z$ and compute over $({\mathsf{F}}(y)[z]/\langle f\rangle)[x]$ . In either case the resulting coefficients lie in a parameterized extension in the sense of Definition 5.1.

Let $\delta_{f}=\mathrm{den}(\mathbf{U}_{f},\mathbf{H}_{f})$ . Here $\delta_{f}$ arises from divisions by elements of the coefficient field ${\mathsf{F}}(z)$ (respectively ${\mathsf{F}}(y)$ ), hence after clearing denominators it may be taken in ${\mathsf{F}}[z]$ (respectively ${\mathsf{F}}[y]$ ). We therefore adjoin the regime

[TABLE]

which is valid on $V(f)\cap D(N_{f})$ .

The regimes indexed by [math] and by single factors $f\in N_{0}$ cover all specializations except for those in which either (i) at least two distinct factors from $N_{0}$ vanish simultaneously, or (ii) for some $f\in N_{0}$ the denominator $\delta_{f}$ vanishes on $V(f)$ . Both situations give rise only to zero-dimensional exceptional sets. Indeed, for distinct irreducible $f,g\in N_{0}$ the intersection $V(f)\cap V(g)\subseteq\overline{{\mathsf{F}}}^{2}$ is finite, and by Bézout’s theorem [8] it contains at most $\deg(f)\deg(g)$ points. Similarly, since $f$ is irreducible and depends on $y$ (respectively $z$ ) while $\delta_{f}\in{\mathsf{F}}[z]$ (respectively ${\mathsf{F}}[y]$ ), we have $\gcd(f,\delta_{f})=1$ and again Bézout’s theorem bounds $|V(f)\cap V(\delta_{f})|$ by $\deg(f)\deg(\delta_{f})$ .

These finitely many points can be enumerated (for example by a resultant computation or a triangular decomposition of the appropriate ideals). For each such point $(y,z)\in\overline{{\mathsf{F}}}^{2}$ we produce a separate regime by evaluating $\mathbf{A}$ at that point and computing a triangular Smith form over the corresponding (possibly algebraic) extension field, as allowed in Definition 5.1.

The degree bounds $\deg(\Delta_{0})={O\,\tilde{}\,}(\rho^{2}de)$ and $\deg(\delta_{f})={O\,\tilde{}\,}(\rho^{2}de)$ follow from the degree-bound observation above. Since $\sum_{f\in N_{0}}\deg(f)=\deg(\Delta_{0})$ , the total number of point regimes arising from the intersections above is bounded by $\mathrm{O}(\deg(\Delta_{0})^{2})={O\,\tilde{}\,}((\rho^{2}de)^{2})$ , yielding the stated ${O\,\tilde{}\,}(\rho^{4}d^{2}e^{2})$ overall regime bound. ∎

Corollary 6.3.

For a PLS with $m\times n$ matrix $\mathbf{A}$ , $\mathbf{b}$ an $m$ -vector, let $\rho=\max(m,n)$ , and suppose $\deg_{x}(\mathbf{A},\mathbf{b})\leq d,\deg_{y}(\mathbf{A},\mathbf{b})\leq e,\deg_{z}(\mathbf{A},\mathbf{b})\leq e$ . Counting both solution regimes and inconsistency regimes, we have

$\mathrm{O}(\rho^{1.5}d^{0.5})$ * regimes in the PLS solution for the univariate case (domain of $\mathbf{A}$ , $\mathbf{b}$ is ${\mathsf{F}}[x]$ ).* 2. 2.

${O\,\tilde{}\,}(\rho^{3.5}d^{1.5}e)$ * regimes in the PLS solution for the bivariate case (domain of $\mathbf{A}$ , $\mathbf{b}$ is ${\mathsf{F}}[x,y]$ ).* 3. 3.

${O\,\tilde{}\,}(\rho^{5.5}d^{2.5}e^{2})$ * regimes in the PLS solution for the trivariate case (domain of $\mathbf{A}$ , $\mathbf{b}$ is ${\mathsf{F}}[x,y,z]$ ).*

Proof.

Consider one CTSNF regime $(\mathbf{U},\mathbf{H},\mathbf{R},N^{\prime},Z^{\prime})$ for $\mathbf{A}$ , so that $\mathbf{H}$ is in triangular Smith form with diagonal entries $s_{i}$ . Let $r_{0}=\max\{i\mid s_{i}\neq 0\}$ and $d_{H}=\sum_{i=1}^{r_{0}}\deg_{x}(s_{i})$ , as in Lemma 5.3. The product of the nonzero diagonal entries of a triangular Smith form is the rank- $r_{0}$ determinantal divisor. Hence it divides every nonzero $r_{0}\times r_{0}$ minor of the input matrix (after the constant right transformation $\mathbf{R}$ ). Since some such minor has degree at most $r_{0}d$ , we have $d_{H}\leq r_{0}d\leq\rho d$ .

Lemma 5.3 therefore implies that applying Algorithm TriangularSmithPLS to this regime produces at most $(n+1)(1+\sqrt{2d_{H}})=\mathrm{O}(\rho^{1.5}d^{0.5})$ PLS regimes, including inconsistency regimes. In the univariate case there is a single CTSNF regime. In the bivariate case we multiply this per-regime bound by the ${O\,\tilde{}\,}(\rho^{2}de)$ CTSNF-regime bound of Theorem 6.1, and in the trivariate case by the ${O\,\tilde{}\,}(\rho^{4}d^{2}e^{2})$ bound of Theorem 6.2. This gives the three stated estimates. ∎

7 Normal forms and Eigenproblems

Comprehensive Hermite Normal form and comprehensive Smith Normal form are immediate corollaries of our comprehensive triangular Smith form. For Hermite form, use the same comprehensive construction but take the right hand cofactor to be the identity, $\mathbf{R}=\mathbf{I}$ , and omit the well-tempered/triangular-Smith divisibility requirement in the Hermite-form computation. For Smith form one can convert each regime of CTSNF to a Smith regime. Where $\mathbf{U}\mathbf{A}\mathbf{R}=\mathbf{H}$ with $\mathbf{H}$ a triangular Smith form, perform column operations to obtain $\mathbf{U}\mathbf{A}\mathbf{V}=\mathbf{S}$ with $\mathbf{S}$ the diagonal of $\mathbf{H}$ . In $\mathbf{H}$ the diagonal entries divide the off diagonal entries in the same row. Subtract multiples of the $i$ -th column from the subsequent columns to eliminate the off diagonal entries. Because the diagonal entries are monic, no new denominator factors arise and $\det(\mathbf{V})=\det(\mathbf{R})\in{\mathsf{F}}$ . Thus when $(\mathbf{U},\mathbf{H},\mathbf{R},N,Z)$ is a valid regime in a CTSNF solution for $\mathbf{A}$ , then $(\mathbf{U},\mathbf{S},\mathbf{V},N,Z)$ is a valid regime for Smith normal form.

It is well known that if $\mathbf{A}\in{\mathsf{K}}^{n\times n}$ for field ${\mathsf{K}}$ (that may involve parameters) and $\lambda$ is an additional variable, then the Smith invariants $s_{1},\ldots,s_{n}$ of $\lambda\mathbf{I}-\mathbf{A}$ are the Frobenius invariants of $\mathbf{A}$ and $\mathbf{A}$ is similar to its Frobenius normal form, $\oplus_{i=1}^{n}\mathbf{C}_{s_{i}}$ , where $\mathbf{C}_{s}$ denotes the companion matrix of polynomial $s$ . Thus we have comprehensive Frobenius normal form as a corollary of CTSNF, however it is without the similarity transform. It would be interesting to develop a comprehensive Frobenius form in which each regime also includes a similarity transform.

Parametric eigenvalue problems for $\mathbf{A}$ correspond to PLS for $\lambda\mathbf{I}-\mathbf{A}$ with zero right hand side. Often eigenvalue multiplicity is the concern. The geometric multiplicity is available from the Smith invariants, as for example on the diagonal of a triangular Smith form. Common roots of $2$ or more of the invariants expose geometric multiplicity and square-free factorization of the individual invariants exposes algebraic multiplicity. Note that square-free factorization may impose further restrictions on the parameters. Comprehensive treatment of square-free factorization is considered in [19].

7.1 Eigenvalue multiplicity example

The following matrix, due originally to a question on sci.math.num-analysis in 1990 by Kenton K. Yee, is discussed in [4]. We change the notation used there to avoid a clash with other notation used here. The matrix is

[TABLE]

One of the original questions was to compute its eigenvectors. Since it contains a symbolic parameter $z$ , this is a parametric eigenvalue problem which we can turn into a parametric linear system, namely to present the nullspace regimes for $\lambda\mathbf{I}-\mathbf{Y}$ .

Over ${\mathsf{F}}(z)[\lambda]$ , after preconditioning, we get as the triangular Smith form diagonal $(1,1,1,1,1,\lambda^{2}-1,\lambda^{2}-1,(\lambda^{2}-1)f(\lambda))$ , where $f(\lambda)=\lambda^{2}-(z+6+z^{-1})\lambda+7$ .

*Remark 7.1**.*

Without preconditioning, the Hermite form diagonal is instead $(1,1,1,1,\lambda-1,\lambda^{2}-1,(\lambda^{2}-1),(\lambda+1)f(\lambda))$ .

The denominator of $\mathbf{U}$ , $\mathbf{H}$ is a power of $z$ , so the only constraint is $z\neq 0$ which is already a constraint for the input matrix. We get regimes of rank $5$ for $\lambda=\pm 1$ , rank $7$ for $\lambda$ being a root of $f$ , and rank $8$ for all other $\lambda$ . In terms of the eigenvalue problem, we get eigenspaces of dimension $3$ for each of $1$ , $-1$ and of dimension $1$ for the two roots of $f(\lambda)$ .

To explore algebraic multiplicity, we can examine when $f$ has $1$ or $-1$ as a root. When $z$ is a root of ${z}^{2}+14\,z+1$ , $f(\lambda)$ factors as $(\lambda+1)(\lambda+7)$ and when $z=1$ we have $f(\lambda)=(\lambda-1)(\lambda-7)$ . These factorizations may be discovered by taking resultants of $f$ with $\lambda-1$ or $\lambda+1$ .

7.2 Matrix Logarithm

Theorem 1.28 of [13] states conditions under which the matrix equation $\exp(\mathbf{X})=\mathbf{A}$ has so-called primary matrix logarithm solutions, and under which conditions there are more. If the number of distinct eigenvalues $s$ of $\mathbf{A}$ is strictly less than the number $p$ of distinct Jordan blocks of $\mathbf{A}$ (that is, the matrix $\mathbf{A}$ is derogatory), then the equation also has so-called nonprimary solutions as well, where the branches of logarithms of an eigenvalue $\lambda$ may be chosen differently in each instance it occurs.

As a simple example of what this means, consider

[TABLE]

When we compute its matrix logarithm (for instance using the MatrixFunction command in Maple), we find

[TABLE]

This is what we expect, and taking the matrix exponential (a single-valued matrix function) gets us back to $\mathbf{A}$ , as expected. However, if instead we consider the derogatory matrix

[TABLE]

then its matrix logarithm as computed by MatrixFunction is also derogatory, namely

[TABLE]

Yet there are other solutions as well: if we add $2\pi i$ to the first entry and $-2\pi i$ to the second logarithm, we unsurprisingly find another matrix $\mathbf{X}_{C}$ which also satisfies $\exp(\mathbf{X})=\mathbf{B}$ . But adding $2\pi i$ to the first entry of $\mathbf{X}_{A}$ while adding $-2\pi i$ to its second logarithm, we get another matrix

[TABLE]

which has the (somewhat surprising) property that $\exp(\mathbf{X}_{D})=\mathbf{B}$ , not $\mathbf{A}$ .

This example demonstrates in a minimal way that the detailed Jordan structure of $\mathbf{A}$ strongly affects the nature of the solutions to the matrix equation $\exp(\mathbf{X})=\mathbf{A}$ . This motivates the ability of code to detect automatically the differing values of the parameters in a matrix that make it derogatory. To explicitly connect this example to CTSNF, consider

[TABLE]

so that $\mathbf{A}$ above is $\mathbf{M}_{b=1}$ and $\mathbf{B}=\mathbf{M}_{b=0}$ . The CTSNF applied to $\lambda\mathbf{I}-\mathbf{M}$ produces two regimes, with forms

[TABLE]

exposing when the logarithms will be linked or distinct. Note that in this case the Frobenius structure equals the Jordan structure.

7.3 Model of infectious disease vaccine effect

[23] have made a model of vaccine effect when there are two subpopulations with differing disease susceptibility and vaccination rates. Within this study stability of the model is a function of the eigenvalues of a Jacobian $\mathbf{J}$ . Thus we are interested in cases where the following matrix is singular.

[TABLE]

Here $w,x$ are vaccination rates for the two populations, $y,z$ are death rates, $a,d$ are within population transmission rates, and $b,c$ are the between population transmission rates. We have simplified somewhat: for instance $a,b,c,d$ are transmission rates multiplied by other parameters concerning population counts. Stability depends on the positivity of the largest real part of an eigenvalue. For the sake of reducing expression sizes in this example we will arbitrarily set $y=z=1/10$ . For the same reason we will skip right multiplication by an R to achieve triangular Smith form. Hermite form $\mathbf{H}$ of $\lambda\mathbf{I}-\mathbf{J}$ will suffice, revealing the eigenvalues that are wanted.

[TABLE]

The discriminant of the last entry gives the desired information for the application subject to the denominator validity: $c\neq 0$ . When $c=0$ the matrix is already in Hermite form, so again the desired information is provided.

This example illustrates that often more than three parameters can be easily handled. In experiments with this model not reported here, we did encounter cases demanding solution beyond the methods of this paper. On a more positive note, we feel that comprehensive normal form tools could help analyze models like this when larger in scope, for instance modeling $3$ or more subpopulations.

7.4 The Kac-Murdock-Szegö example

[7] report times for computation of the comprehensive Jordan form for matrices of the following form, taken from [29], of dimensions $2$ to about $20$ :

[TABLE]

This is, apart from the $(1,1)$ entry and the $(n,n)$ entry, a Toeplitz matrix containing one parameter, $\rho$ . The reported times to compute the Jordan form were plotted in [7] on a log scale, and looked as though they were exponentially growing with the dimension, and were reported in that paper as growing exponentially.

The results of this paper show instead that polynomial time is possible for this family, because there are only two parameters ( $\rho$ and the eigenvalue parameter, say $\lambda$ ). The Hermite forms for these matrices are all (as far as we have computed) trivial, with diagonal all $1$ except the final entry which contains the determinant. Thus all the action for the Jordan form must happen with the discriminant of the determinant. Experimentally, the discriminant with respect to $\lambda$ has degree $n^{2}+n-4$ for KMS matrices of dimension $n\geq 2$ (this formula was deduced experimentally by giving a sequence of these degrees to the Online Encyclopedia of Integer Sequences [27]) and each discriminant has a factor $\rho^{n(n-1)}$ , leaving a nontrivial factor of degree $2n-4$ growing only linearly with dimension. The case $\rho=0$ does indeed give a derogatory KMS matrix (the identity matrix). The other factor has at most a linearly-growing number of roots for each of which we expect the Jordan form of the corresponding KMS matrix to have one block of size two and the rest of size one. We therefore see only polynomial cost necessary to compute comprehensive Jordan forms for these matrices, in accord with our theorem.

8 Conclusions

We have shown that using the CTSNF to solve parametric linear systems is of cost polynomial in the dimension of the linear system and polynomial in parameter degree, for problems containing up to three parameters. This shows that polynomially many regimes suffice for problems of this type. To the best of our knowledge, this is the first method to prove a polynomial regime bound for parametric linear systems with at most three parameters, certainly in the constructible-regime model considered here.

It remains an open question whether, for linear systems with a fixed number of parameters greater than three, a number of regimes suffices that is polynomial in the input matrix dimension and polynomial degree of the parameters, being exponential only in the number of parameters.

Through experiments with random matrices we have indications that the worst case bounds we give are sharp, though we haven’t proven this point. As the examples indicated, many problems will have fewer regimes, and sometimes substantially fewer regimes. We have not investigated the effects of further restrictions of the type of problem, such as to sparse matrices.

Acknowledgements

This work was supported by the Natural Sciences and Engineering Research Council of Canada and by the Ontario Research Centre for Computer Algebra. The third author, L. Rafiee Sevyeri, would like to thank the Symbolic Computation Group (SCG) at the David R. Cheriton School of Computer Science of the University of Waterloo for their support while she was a visiting researcher there.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Boyer and Kaltofen [2014] Boyer B, Kaltofen EL. Numerical linear system solving with parametric entries by error correction. In: Proceedings of the 2014 Symposium on Symbolic-Numeric Computation; 2014. p. 33–38. 10.1145/2631948.2631956 . · doi ↗
2Buchberger [1988] Buchberger B. Applications of Gröbner bases in non-linear computational geometry. In: Janßen R, editor. Trends in Computer Algebra Berlin, Heidelberg: Springer Berlin Heidelberg; 1988. p. 52–80. 10.1007/3-540-18928-9_5 . · doi ↗
3Camargos Couto et al. [2020] Camargos Couto AC, Moreno Maza M, Linder D, Jeffrey DJ, Corless RM. Comprehensive LU Factors of Polynomial Matrices. In: Slamanig D, Tsigaridas E, Zafeirakopoulos Z, editors. Mathematical Aspects of Computer and Information Sciences Cham: Springer International Publishing; 2020. p. 80–88. 10.1007/978-3-030-43120-4_8 . · doi ↗
4Corless [2002] Corless RM. Essential Maple 7: an introduction for scientific programmers. Springer Science & Business Media; 2002. 10.1007/b 97270 . · doi ↗
5Corless et al. [2020] Corless RM, Giesbrecht M, Rafiee Sevyeri L, Saunders BD. On Parametric Linear System Solving. In: Computer Algebra in Scientific Computing Springer; 2020. p. 188–205. ”10.1007/978-3-030-60026-6_11” . · doi ↗
6Corless and Jeffrey [1997] Corless RM, Jeffrey DJ. The Turing Factorization of a Rectangular Matrix. SIGSAM Bull. 1997 Sep;31(3):20–30. 10.1145/271130.271135 . · doi ↗
7Corless et al. [2017] Corless RM, Moreno Maza M, Thornton SE. Jordan Canonical Form with Parameters from Frobenius Form with Parameters. In: Blömer J, Kotsireas IS, Kutsia T, Simos DE, editors. Mathematical Aspects of Computer and Information Sciences Springer International Publishing; 2017. p. 179–194. 10.1007/978-3-319-72453-9_13 . · doi ↗
8Cox et al. [2013] Cox D, Little J, O’Shea D. Ideals, varieties, and algorithms: an introduction to computational algebraic geometry and commutative algebra. Springer Science & Business Media; 2013. 10.1007/978-0-387-35651-8 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On Parametric Linear System Solving

Abstract

1 Introduction

2 Previous Work on Parameterized Linear Systems

3 Definitions and Notation

Definition 3.1**.**

Definition 3.2**.**

Definition 3.3**.**

Example 3.4**.**

Definition 3.5**.**

4 Triangular Smith forms and degree bounds

Lemma 4.1**.**

Proof.

Definition 4.2**.**

Definition 4.3**.**

Fact 4.4**.**

Fact 4.5**.**

Proof.

Theorem 4.6**.**

Proof.

5 Reduction of PLS to triangular Smith forms

Definition 5.1**.**

Lemma 5.2**.**

Proof.

Lemma 5.3**.**

Proof.

Theorem 5.4**.**

Proof.

6 Solving Comprehensive Triangular Smith Normal Form

Theorem 6.1**.**

Proof.

Theorem 6.2**.**

Proof.

Corollary 6.3**.**

Proof.

7 Normal forms and Eigenproblems

7.1 Eigenvalue multiplicity example

Remark 7.1*.*

7.2 Matrix Logarithm

7.3 Model of infectious disease vaccine effect

7.4 The Kac-Murdock-Szegö example

8 Conclusions

Acknowledgements

Definition 3.1.

Definition 3.2.

Definition 3.3.

Example 3.4.

Definition 3.5.

Lemma 4.1.

Definition 4.2.

Definition 4.3.

Fact 4.4.

Fact 4.5.

Theorem 4.6.

Definition 5.1.

Lemma 5.2.

Lemma 5.3.

Theorem 5.4.

Theorem 6.1.

Theorem 6.2.

Corollary 6.3.

*Remark 7.1**.*