The norm of the saturation of a binomial ideal, and applications to   Markov bases

David Holmes

arXiv:1907.10268·math.AC·December 30, 2020

The norm of the saturation of a binomial ideal, and applications to Markov bases

David Holmes

PDF

TL;DR

This paper introduces a new measure called the norm to quantify the complexity of saturations of binomial ideals, providing bounds and exploring applications in statistics.

Contribution

It defines the norm of the saturation of binomial ideals and establishes bounds based on computable invariants, with applications to statistical models.

Findings

01

Bound on the norm in terms of ideal invariants

02

Application to statistical models and Markov bases

03

New measure for ideal saturation complexity

Abstract

Given a pure binomial ideal I in variables x_i, we define a new measure of the complexity of the saturation of I with respect to the product of the variables x_i, which we call the norm. We give a bound on the norm in terms of easily-computed invariants of the ideal. We discuss statistical applications both practical and theoretical.

Equations84

F (u) = {v \in N^{r} : A u = A v} .

F (u) = {v \in N^{r} : A u = A v} .

x^{b^{+}} : = i = 1 \prod r x_{i}^{b_{i}^{+}}, x^{b^{-}} : = i = 1 \prod r x_{i}^{b_{i}^{-}},

x^{b^{+}} : = i = 1 \prod r x_{i}^{b_{i}^{+}}, x^{b^{-}} : = i = 1 \prod r x_{i}^{b_{i}^{-}},

g = i = 1 \sum N ϵ_{i} m_{i} (x^{b_{i}^{+}} - x^{b_{i}^{-}}) .

g = i = 1 \sum N ϵ_{i} m_{i} (x^{b_{i}^{+}} - x^{b_{i}^{-}}) .

n (2 n β)^{n - 1} .

n (2 n β)^{n - 1} .

v_{1} - v_{2} = i = 1 \sum n a_{i} b_{i}

v_{1} - v_{2} = i = 1 \sum n a_{i} b_{i}

A = [03122130] .

A = [03122130] .

b_{1} = 1 - 2 10, b_{2} = 01 - 2 1 .

b_{1} = 1 - 2 10, b_{2} = 01 - 2 1 .

B_{n} = ⎩ ⎨ ⎧ 01 - 1, - 1 n 1 - n ⎭ ⎬ ⎫

B_{n} = ⎩ ⎨ ⎧ 01 - 1, - 1 n 1 - n ⎭ ⎬ ⎫

A_{n} = [1 n 2 n - 1 \dots \dots n - 1 2 n 1],

A_{n} = [1 n 2 n - 1 \dots \dots n - 1 2 n 1],

B_{n} = ⎩ ⎨ ⎧ 1 - 2 100 ⋮ 0, 01 - 2 10 ⋮ 0, \dots, 00 ⋮ 1 - 2 1 ⎭ ⎬ ⎫,

B_{n} = ⎩ ⎨ ⎧ 1 - 2 100 ⋮ 0, 01 - 2 10 ⋮ 0, \dots, 00 ⋮ 1 - 2 1 ⎭ ⎬ ⎫,

v + i = 1 \sum r c_{i} \in F (v),

v + i = 1 \sum r c_{i} \in F (v),

A = I d_{I} \otimes I d_{J} \otimes 1_{K} I d_{I} \otimes 1_{J} \otimes I d_{K} 1_{I} \otimes I d_{J} \otimes I d_{K},

A = I d_{I} \otimes I d_{J} \otimes 1_{K} I d_{I} \otimes 1_{J} \otimes I d_{K} 1_{I} \otimes I d_{J} \otimes I d_{K},

N^{'} = n (2 n β)^{n - 1} = 64 (128)^{63} \approx 3.6 \times 1 0^{134} .

N^{'} = n (2 n β)^{n - 1} = 64 (128)^{63} \approx 3.6 \times 1 0^{134} .

f_{i}^{+} = x^{b_{i}^{+}}, f_{i}^{-} = x^{b_{i}^{-}}, f_{i} = f_{i}^{+} - f_{i}^{-}

f_{i}^{+} = x^{b_{i}^{+}}, f_{i}^{-} = x^{b_{i}^{-}}, f_{i} = f_{i}^{+} - f_{i}^{-}

Sat_{x_{1} \dots x_{r}} I_{B} = {a \in R : \exists m > 0 : a (x_{1} \dots x_{r})^{m} \in I_{B}} .

Sat_{x_{1} \dots x_{r}} I_{B} = {a \in R : \exists m > 0 : a (x_{1} \dots x_{r})^{m} \in I_{B}} .

S (f, g) = g^{+} f + f^{-} g = f^{+} g^{+} - f^{-} g^{-} .

S (f, g) = g^{+} f + f^{-} g = f^{+} g^{+} - f^{-} g^{-} .

S (ϵ, t) = i = 1 \prod n (f_{i}^{ϵ (i)})^{t (i)} - i = 1 \prod n (f_{i}^{- ϵ (i)})^{t (i)} \in I_{B},

S (ϵ, t) = i = 1 \prod n (f_{i}^{ϵ (i)})^{t (i)} - i = 1 \prod n (f_{i}^{- ϵ (i)})^{t (i)} \in I_{B},

n P = m S (ϵ, t) .

n P = m S (ϵ, t) .

P - m_{1} f_{i_{1}} = j = 2 \sum k m_{j} f_{i_{j}}

P - m_{1} f_{i_{1}} = j = 2 \sum k m_{j} f_{i_{j}}

m j = 2 \sum k m_{j} f_{i_{j}} = n S (ϵ, t) .

m j = 2 \sum k m_{j} f_{i_{j}} = n S (ϵ, t) .

m P = n S^{+} - n S^{-} + m_{1} f_{i_{1}}^{+} - m_{1} f_{i_{1}}^{-} .

m P = n S^{+} - n S^{-} + m_{1} f_{i_{1}}^{+} - m_{1} f_{i_{1}}^{-} .

f_{i_{1}}^{+} m P = n (f_{i_{1}}^{+} S^{+} - f_{i_{1}}^{-} S^{-}) = n S^{'}

f_{i_{1}}^{+} m P = n (f_{i_{1}}^{+} S^{+} - f_{i_{1}}^{-} S^{-}) = n S^{'}

Sat_{x_{1} \dots x_{r}} I_{B} = (\frac{S ( ϵ _{j} , t _{j} )}{m _{j}} : 1 \leq j \leq M) .

Sat_{x_{1} \dots x_{r}} I_{B} = (\frac{S ( ϵ _{j} , t _{j} )}{m _{j}} : 1 \leq j \leq M) .

min (i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{ϵ (i)}, i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{- ϵ (i)}) .

min (i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{ϵ (i)}, i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{- ϵ (i)}) .

i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{ϵ (i)} \leq i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{- ϵ (i)},

i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{ϵ (i)} \leq i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{- ϵ (i)},

i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{ϵ (i)} \geq i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{- ϵ (i)} .

i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{ϵ (i)} \geq i = 1 \sum n t (i) ord_{x_{j}} f_{i}^{- ϵ (i)} .

T_{\epsilon,\delta}=\{t\in{\mathbb{N}}^{n}:\forall 1\leq i\leq r,\text{ the minimum in \lx@cref{creftype~refnum}{eq:power_of_x_dividing} is achieved on the side }\delta(i)\}.

T_{\epsilon,\delta}=\{t\in{\mathbb{N}}^{n}:\forall 1\leq i\leq r,\text{ the minimum in \lx@cref{creftype~refnum}{eq:power_of_x_dividing} is achieved on the side }\delta(i)\}.

δ ⋃ T_{ϵ, δ} = N^{n} .

δ ⋃ T_{ϵ, δ} = N^{n} .

φ_{t} = \frac{S ( ϵ , t )}{\prod _{j = 1}^{r} x _{j}^{\sum_{i = 1}^{n} t (i) ord_{x_{j}} f_{i}^{ϵ (i) δ (i)}}},

φ_{t} = \frac{S ( ϵ , t )}{\prod _{j = 1}^{r} x _{j}^{\sum_{i = 1}^{n} t (i) ord_{x_{j}} f_{i}^{ϵ (i) δ (i)}}},

φ_{t} \in (φ_{t_{1}}, \dots, φ_{t_{a}}) \subseteq R .

φ_{t} \in (φ_{t_{1}}, \dots, φ_{t_{a}}) \subseteq R .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The norm of the saturation of a binomial ideal, and applications to Markov bases

David Holmes

Abstract.

Given a pure binomial ideal $I$ in variables $x_{i}$ , we define a new measure of the complexity of the saturation of $I$ with respect to the product of the $x_{i}$ , which we call the norm. We give a bound on the norm in terms of easily-computed invariants of the ideal. We discuss statistical applications both practical and theoretical.

1 Introduction
1.1 Background
1.1.1 A random walk in the fibre
1.1.2 Saturated ideals and connected fibres
1.1.3 Connected fibres without saturation
1.2 Results
1.2.1 A bound on the complexity of the saturation
1.2.2 Comparison to other results in the literature
1.2.3 Improving the algorithm of Aoki, Hara and Takemura
1.2.4 An alternative algorithm for connecting the fibres
1.3 Practical consequences
2 Examples
2.1 A very simple example
2.2 Families where the fibres are arbitrarily badly connected
2.3 The no-three-factor-interaction model
3 Proof of the main results
3.1 Proof of theorem 1.3
3.2 Proof of proposition 1.4

1. Introduction

1.1. Background

Let $A$ be a $k\times r$ matrix with integer entries, and let $u\in{\mathbb{N}}^{r}$ be a vector with non-negative entries. The fibre containing $u$ is defined as

[TABLE]

Understanding the structure of this fibre is important in a number of statistical tests. For example, the vectors in ${\mathbb{N}}^{r}$ might represent tables of data, and the matrix $A$ might output the row and column sums of these tables, so the fibre consists of all tables with non-negative entries and with the same row and column sums as the starting table $u$ . See [DS98] for more details and examples. In particular, one often wants to generate samples from some probability distribution (often uniform or hypergeometric) on the fibre. If the fibre is small it is practical to simply enumerate all the elements of the fibre. However, in practical applications the fibre is often far too large to enumerate, and the standard approach is to perform a random walk in the fibre, generating samples via the Metropolis-Hastings Markov-Chain Monte-Carlo algorithm. In order to perform a random work, we must upgrade the fibre into a graph (whose vertices are the elements of the fibre). The requirements for the Metropolis-Hastings algorithm are rather mild, the key condition is that the graph must be connected (since the random walk will always remain within its starting connected component).

1.1.1. A random walk in the fibre

The most naive way to convert the fibre into a graph is to choose a generating set $B$ for the kernel $K\subseteq{\mathbb{Z}}^{r}$ of $A$ , and then form a (simple, undirected) graph by putting an edge between distinct vertices $v_{1}$ and $v_{2}$ whenever $v_{1}-v_{2}\in B$ or $v_{2}-v_{1}\in B$ . We say ${\mathcal{F}}(u)$ is connected by $B$ if the resulting graph is connected. In section 2 we will see several examples of $B$ that fail to connect ${\mathcal{F}}(u)$ . The major innovation of Diaconis and Sturmfels [DS98] was to give an algorithm to construct a generating set $B$ which connects every fibre of a given matrix $A$ .

1.1.2. Saturated ideals and connected fibres

To describe their result, we need a little more notation. Given $b\in B$ , we write $b=b^{+}-b^{-}$ , both summands having non-negative entries. In the ring $R={\mathbb{Z}}[x_{1},\dots,x_{r}]$ we form the elements

[TABLE]

and define an ideal ${\mathcal{I}}_{B}=(x^{b^{+}}-x^{b^{-}}:b\in B)\subseteq R$ . Then the key theorem is:

Theorem 1.1 (Diaconis-Sturmfels, [DS98]).

Fix a $k\times r$ matrix $A$ , and let $B$ be a generating set for the integral kernel of $A$ . Suppose the ideal ${\mathcal{I}}_{B}$ is saturated with respect to the element $x_{1}\cdots x_{r}\in R$ . Then for every $u\in{\mathbb{N}}^{r}$ , the fibre ${\mathcal{F}}(u)$ is connected by $B$ .

If ${\mathcal{I}}_{B}$ is saturated, $B$ is often called a Markov basis (this should not be interpreted as implying linear independence of the elements of $B$ ). The theorem then tells us that we can generate samples according to our preferred distribution by following the naive random walk algorithm above using the basis $B$ .

On the other hand, suppose that we have a generating set $B$ such that ${\mathcal{I}}_{B}$ is not saturated. We can (at least in principal) apply a standard saturation algorithm to ${\mathcal{I}}_{B}$ to produce a saturated ideal, and moreover the generating set produced will in fact consist of pure difference binomials (i.e. differences of monomials; see definition 3.1). Reversing the procedure eq. 1.1.2 we can recover a new generating set $B^{\prime}$ for the kernel $K$ of $A$ , and following the above theorem of Diaconis-Sturmfels, this generating set will connect all fibres, enabling efficient sampling.

Thus, when it is possible to compute this saturation, the problem is essentially solved. However, the standard algorithm for saturation involves the computation of $r$ Gröbner bases, and is at present only practical for relatively small examples (software to carry out such computations can be found at 4ti2.de).

1.1.3. Connected fibres without saturation

The difficulty of computing the saturation motivated Aoki, Hara and Takemura [HAT12] to suggest an algorithm for generating samples without needing to compute the saturation. They begin in the same say, with a generating set $B=\{b_{1},\dots,b_{n}\}$ for the integral kernel, but instead of making moves consisting of addition or subtraction of a single element of $B$ , they instead generate $n$ non-negative integers $a_{i}$ from a Poisson distribution with some chosen mean $\lambda$ , and $n$ elements $\epsilon_{i}\in\{+1,-1\}$ , and their move consists of addition of $\sum_{i}\epsilon_{i}a_{i}b_{i}$ if the result lies in the fibre, and staying put otherwise. Since the Poisson distribution generates every non-negative integer with non-zero probability it is immediate that the resulting fibre is connected; in fact, the graph on the fibre is a complete graph, but with highly non-uniform probability of selecting moves from among edges.

They then perform a number of numerical experiments with various values of $\lambda$ . In cases where it was possible to compute the saturation, they show that for careful choice of $\lambda$ their algorithm performs comparably to that coming from a Markov basis, and they also illustrate that their algorithm can be applied in cases where the saturation is too hard to compute (though they can of course provide no guarantee that their algorithm is converging in reasonable time; it appears to do so, but this might be deceptive if the fibre has some connected components that are very hard to hit — see section 1.3).

There is some tension in the use of this algorithm when it comes to choosing the value of $\lambda$ . If one chooses $\lambda$ very large then the algorithm takes a long time before it (appears to) converge. On the other hand, a small value of $\lambda$ will product more rapid apparent convergence, but there is a greater risk that one is simply failing to see one or more connected components of the fibre in the time for which the algorithm is run.

1.2. Results

1.2.1. A bound on the complexity of the saturation

In the light of the above discussion it is natural to try to bound how large and complex the saturation of the ideal ${\mathcal{I}}_{B}$ can get. To make this more precise, we define the norm of the saturation:

Definition 1.2.

Let $B$ be set of $n\geq 1$ vectors in ${\mathbb{Z}}^{r}$ . We write ${\mathcal{I}}_{B}$ for the ideal in $R={\mathbb{Z}}[x_{1},\dots,x_{r}]$ as defined in section 1.1.2. The norm of $B$ is the smallest integer $N\geq 1$ such that there exists a finite generating set $G$ for the saturation of ${\mathcal{I}}_{B}$ with respect to $x_{1}\cdots x_{r}$ , with the properties that

(1)

Every element of $G$ is a pure difference binomial; 2. (2)

Every $g\in G$ can be written in the form

[TABLE]

where the $\epsilon_{i}\in\{-1,0,1\}$ , the $m_{i}$ are monomials, and the $b_{i}$ are elements of $B$ .

The main result of this paper is the following explicit bound on the norm. In sections 1.2.3 and 1.2.4 we will show how this can be applied to give new algorithms for sampling from fibres without needing to compute the saturation.

Theorem 1.3.

Let $B$ be set of $n\geq 1$ vectors in ${\mathbb{Z}}^{r}$ . Write $\beta$ for the maximum of the absolute values of the coefficients of elements of $B$ . Then the norm of $B$ is at most

[TABLE]

Our proof (see section 3.1)is constructive; it gives an algorithm to determine a generating set $F$ as in the definition of the norm. We do not know whether this algorithm could be practical; it is a-priori less efficient than procedures using Gröbner bases, but is highly parallelisable.

The connection of the norm to fibre connectivity and Markov chains runs via the following result (proven in section 3.2):

Proposition 1.4.

Let $A$ be a $k\times r$ integer matrix, and $B=\{b_{1},\dots,b_{n}\}$ a basis of the kernel, with $B$ having norm $N$ . Let $u\in{\mathbb{N}}^{r}$ , and construct a graph with vertex set the fibre ${\mathcal{F}}(u)$ , and where we draw an edge from $v_{1}$ to $v_{2}$ if and only if $v_{1}-v_{2}$ can be written as an integer linear combination

[TABLE]

with $\sum_{i=1}^{n}\lvert a_{i}\rvert\leq N$ . Then this graph is connected.

*Remark 1.5**.*

Given a $k\times r$ integral matrix $A$ , note that it is easy to compute a basis $B$ of the integral kernel of $A$ from the Smith normal form of $A$ . Indeed, if $SAT=D$ is the Smith normal form (so $S$ and $T$ are invertible, and $D$ diagonal with $D_{i,i}\mid D_{i+1,i+1}$ ), then let $1\leq j\leq r$ be maximal such that $D_{j,j}\neq 0$ . Then an integral basis of the kernel of $A$ is given by $Te_{j+1},\dots,Te_{r}$ , where $e_{i}$ is the $i$ th standard basis vector in ${\mathbb{Z}}^{r}$ .

Conversely, while $B$ does not determine $A$ , it does determine the fibres ${\mathcal{F}}(u)$ , so the matrix $A$ is not really essential, but is very relevant to the statistical applications.

1.2.2. Comparison to other results in the literature

Needless to say, we are not the first to try to control the complexity of the saturation of an ideal in a polynomial ring. Indeed, the standard method of computing the saturation reduces to a Gröbner basis computation, whose efficient implementation has been the focus of too much research to begin to list here. Specialising to the case of binomial ideals, the literature is still much too large to give more than a quick glimpse of. There are general theoretical results on the structure of fibre graphs ([Win16], [HW15], [Win19], [GP13], …). There are also many results bounding the degree of the binomials appearing in the saturation, see ([Stu96, chapter 13], [HMdCTY14], [KOT15], …), and bounding the Markov complexity; this is defined in [SS03], and studied in [CTV14] and elsewhere.

However, we are not aware on bounds on the ‘norm’ (definition 1.2) in the literature. Indeed, from an algebraic point of view it appears a rather unnatural invariant. The reason for studying it comes purely from the application (via proposition 1.4) to fibre connectivity and Markov bases. In the remainder of section 1 we hope to justify it from this point of view, and perhaps motivate further research in this direction. An unusual feature of our results is that we do not utilise Gröbner bases of related techniques; this is not from dislike, but simply because we could not see how to bound the norm from that perspective; we hope that others may have more success.

1.2.3. Improving the algorithm of Aoki, Hara and Takemura

Aoki, Hara and Takemura connect the fibre by allowing arbitrarily large integer linear combinations of elements of the basis $B$ . However, proposition 1.4 shows that it suffices to take combinations with coefficients bounded by the norm $N$ of $B$ . This allows us to improve the efficiency of their algorithm, by truncating the Poisson distribution at $N$ . A second algorithm they present (where the coefficients of the $b_{i}$ are chosen from a multinomial distribution) can be enhanced in a similar way. The bound on the norm coming from theorem 1.3 is likely to be large in comparison with the chosen $\lambda$ , so will not have a large impact on the runtime, but we hope that better bounds on the norm can be found in future.

A more useful application might be to predicting good values of the constant $\lambda$ in their algorithm, or giving heuristic bounds on the convergence time for a given value of $\Lambda$ . The norm $N$ can be seen as the maximum distance between connected components of the fibre, thus to be have a reasonable chance of hitting all components we should take a number of steps that is very large compared to $1/{\mathbb{P}}(\operatorname{Poisson}_{\lambda}\geq N)$ .

1.2.4. An alternative algorithm for connecting the fibres

In the naive algorithm of section 1.1.1, one starts at a vector $v\in{\mathcal{F}}(u)$ , and chooses at random an element $b\in\pm B$ , and considers the step $v+b$ . If $v+b$ in ${\mathcal{F}}(u)$ then this is returned as the next element of the Markov chain. If $v+b\notin{\mathcal{F}}(u)$ , then the algorithm simply returns $v$ . However, if we have a bound on the norm then we can modify the algorithm so that the fibre will always be connected; if $v+b\notin{\mathcal{F}}(u)$ then, rather than returning $v$ , we choose another element $b_{1}$ from $\pm B$ , and consider the vector $v+b+b_{1}$ . If $v+b+b_{1}$ lies in ${\mathcal{F}}(u)$ we return is as the next step in the Markov chain, otherwise we repeat, until we either hit ${\mathcal{F}}(u)$ again, or we have taken $N$ consecutive steps outside the fibre, in which case we return $v$ again. Alternatively, this can be viewed as a weighted random walk in a certain graph with vertex set ${\mathcal{F}}(u)$ . To define this graph, we first define a graph ${\mathcal{F}}_{\mathbb{Z}}(u)$ with vertex set $\{v\in{\mathbb{Z}}^{r}:Au=Av\}$ and with an edge between $v_{1}$ and $v_{2}$ whenever $v_{1}-v_{2}\in\pm B$ . Then we define a graph with vertex set ${\mathcal{F}}(u)$ by putting an edge between two vertices whenever they can be connected by a path in ${\mathcal{F}}_{\mathbb{Z}}(u)$ of length at most $N$ , and which does not intersect ${\mathcal{F}}(u)$ except at its endpoints. Again, by proposition 1.4 this new graphs is guaranteed to be connected.

More generally, with theorems 1.3 and 1.4 in hand it is easy to propose new sampling algorithms which guarantee to connect the fibre. The challenge is to design algorithms with reasonable runtime, at least heuristically (rigorous runtime analysis seems hard but very interesting).

If the fibre ${\mathcal{F}}(u)$ is large with respect to the norm $N$ then designing reasonably efficient algorithms is not hard, since the runtime will be dominated by time spent in the ‘interior’ of the fibre. On the other hand, if the fibre is small compared to $N$ then the runtime will be dominated by time spent around the edge of the fibre looking for new connected components, and will depend sensitively on the norm (or more precisely, on our bound on the norm).

1.3. Practical consequences

The algorithm of section 1.1.3 is proven to converge. And in practise the Markov chain is often observed to settle down quite fast. Indeed, in practise it is the latter which will generally be relied upon; people run algorithms until the chain appears to converge. However, there is a critical problem here. Namely, we see in section 2.2 examples where the chain will appear to converge very rapidly, but this ‘apparent’ limit will not be the true limit (the runtime required to achieve true convergence may easily be arranged to exceed the lifespan of the solar system). We hope that this kind of pathological behaviour will be very rare in practise, but at present this seems hard to verify. Our aim in this paper is to get an idea of how long the algorithm should be run in order to be reasonably confident that the ‘apparent limit’ of the chain is in fact the true limit. We are not completely successful in this, partly because our bound on the norm is rather large for practical use (and probably not sharp), and also because passing from the bound in theorem 1.3 to an estimate on the convergence time needs substantial further work. We think it is interesting and useful to investigate this further. In the meantime, we would encourage people this type of algorithm to let it run for as long as possible, even after the chain appears to have settled down, to maximise the change of hitting new connected components.

Acknowledgements

This work owes its existence to a seminar on algebraic statistics organised in Leiden in the Autumn of 2018 by Garnet Akeyr, Rianne de Heide, and Rosa Winter. I am very grateful to them for organising it, for the many expert speakers who took the time to patiently explain basic ideas of probability and statistics to us, and especially to the determined participants who survived to the end, and offered very useful comments on a presentation of the results contained here.

2. Examples

2.1. A very simple example

Consider the matrix

[TABLE]

An integral basis for the kernel of $A$ is then given by $B=\{b_{1},b_{1}\}$ where

[TABLE]

The fibre containing the vector $\begin{bmatrix}2&2&2&2\end{bmatrix}^{T}$ is illustrated in fig. 1, where red arrows correspond to addition of $b_{1}$ , and blue arrows to addition of $b_{2}$ . Evidently, this fibre is not connected, since the element $\begin{bmatrix}4&0&0&4\end{bmatrix}^{T}$ is isolated. Thus is our chain begins anywhere in the large component it will never hit the isolated vertex, and if it begins at the isolated vertex it will remain there. This has practical consequences, since it is common to simply run such a Markov chain until it appears (by eye) to have converged; in this example, convergence will be rapid, but the resulting distribution will not be the expected one (c.f. section 1.3).

The approach of Diaconis-Sturmfels is to replace the basis $B$ by a larger generating set which makes the fibre connected. The ideal ${\mathcal{I}}_{B}$ is generated by $x_{1}x_{3}-x_{2}^{2}$ and $x_{2}x_{4}-x_{3}^{2}$ , and its saturation can be generated by these two polynomials together with the polynomial $x_{1}x_{4}-x_{2}x_{3}$ , the latter corresponding to the vector $\begin{bmatrix}1&-1&-1&1\end{bmatrix}^{T}$ . Clearly one can step from $\begin{bmatrix}3&1&1&3\end{bmatrix}^{T}$ to $\begin{bmatrix}4&0&0&4\end{bmatrix}^{T}$ by addition of this new vector, so the fibre is indeed connected by this new generating set for the integral kernel of $A$ .

Our approach is to allow the chain to step briefly outside the fibre while it hunts for vectors with non-negative entries. As long as we allow two negative steps the fibre will become connected, as we can step from $\begin{bmatrix}3&1&1&3\end{bmatrix}^{T}$ to $\begin{bmatrix}4&0&0&4\end{bmatrix}^{T}$ via $\begin{bmatrix}4&-1&2&3\end{bmatrix}^{T}$ or $\begin{bmatrix}3&2&-1&4\end{bmatrix}^{T}$ ; one sees easily that the norm is $2$ . Let us compute the bound resulting from theorem 1.3: we have $\beta=2$ and $n=2$ , so our bound is $16$ . Thus if we use the bound from the theorem we should allow 16 negative steps; it is clear that this will be sufficient to connect the fibre, but also that this bound is not optimal.

2.2. Families where the fibres are arbitrarily badly connected

Consider the $1\times 3$ matrix $A=\begin{bmatrix}1&1&1\end{bmatrix}$ , and write $e_{i}$ for the $i$ th standard basis vector in ${\mathbb{Z}}^{3}$ . Let $u=e_{2}$ . Then the fibre ${\mathcal{F}}(u)=\{e_{1},e_{2},e_{3}\}$ . For a positive integer $n$ , choose the basis

[TABLE]

of the kernel of $A$ . Then the fibre consists of two connected components, namely $\{e_{2},e_{3}\}$ and $\{e_{1}\}$ . Moreover, to step between the connected components requires $(n-1)$ consecutive negative steps. Thus for every positive integer $M$ and every real number $\lambda$ there exists an integer $n$ such that the algorithm of Aoki, Hara and Takemura presented in section 1.1.3 applied to the above basis $B_{n}$ will appear to converge immediately, but will take $M$ steps before the probability of hitting the other connected component rises above any given positive threshold.

This example is quite artificial, as the fibre is essentially simple, but we have made a poor choice of generating set $B_{n}$ . We can also construct a slightly less artificial example of the same phenomenon, by generalising the example in section 2.1. For an integer $n\geq 2$ , let

[TABLE]

and consider the basis of the integral kernel given by

[TABLE]

where we denote the elements of $B_{n}$ by $b_{2},\dots,b_{n-1}$ in the given order. Then the fibre of $\begin{bmatrix}2&\cdots&2\end{bmatrix}^{T}$ contains the vector $v=\begin{bmatrix}n&0&\cdots&0&n\end{bmatrix}^{T}$ . This vector $v$ is at least $n-2$ steps distant from any other point in the fibre; more precisely, if $c_{1},\dots,c_{r}\in\pm B_{n}$ are such that

[TABLE]

then either $r\geq n-2$ or $v+\sum_{i=1}^{r}c_{i}=v$ (the bound $n-2$ is in fact sharp). We leave the elementary verification to the interested reader. Again we see that, though the algorithm of section 1.1.3 (and variants) may appear to converge rapidly, there are connected components which take an arbitrarily long time to hit.

2.3. The no-three-factor-interaction model

This model is described in detail (in particular, its statistical interpretation) in [AHT12]. It depends on a choice of three positive integers $I$ , $J$ and $K$ ; we will often take $I=J=K$ for simplicity. The matrix $A$ is then an ${(IJ+JK+KI)\times IJK}$ matrix, described in a slightly complicated way. Define $Id_{I}$ to be the $I\times I$ identity matrix, and $1_{I}$ to be a row vector of length $I$ with all entries equal to $1$ . Then

[TABLE]

where $\otimes$ represents the Kroneker product of matrices.

In [HAT12], the authors numerically test their algorithm (section 1.1.3) on the no-three-factor-interaction model in the cases $I=J=K=3,$ $5,$ and $10$ . In the case $I=3$ the saturation can be computed by Gröbner basis techniques, but seems presently out of reach $I=5$ , and worse for $I=10$ . In each case they compute a basis for the integral kernel, then run numerical tests of their algorithm for several values of the Poisson parameter $\lambda$ , and also occasionally replacing the Poisson with a different distribution (we are not completely clear on how they chose these parameters and distributions). In the case $I=3$ they compare their results to those obtained using a saturated basis, and observe that the Markov chains coming from their algorithm converge similarly to those coming from a saturated basis (though for $\lambda=50$ the convergence is rather slow).

For $I=10$ their algorithm does not converge well, but for $I=5$ it appears to converge fairly rapidly. As throughout this paper, the question we are interested in is whether this apparent convergence can be trusted, or is it possible that there is some connected component of the fibre which their chain has never hit? Of course, their algorithm will find every component with probability 1 if allowed to run for unlimited time, but there is no a-priori reason to assume that the time required for this will be in any way comparable to the time required for the chain to appear to settle down.

To try to get a handle on this, let us compute our upper bound on the number of negative steps required to walk between components (the ‘distance between’ connected components of the fibre). Using SAGE we compute the smith normal form of the $75\times 125$ matrix $A$ , obtaining an integral basis $B$ with $n=64$ elements. The largest absolute value of an entry in $B$ is $\beta=1$ . This leads to an upper bound on the norm by

[TABLE]

Now, in this example Aoki, Hara and Takemura replace the Poisson distribution with a geometric distribution (for reasons which are unclear to us), and try parameters $p=0.1,$ $0.5$ . The proportion of steps in their algorithm which will exceed $N^{\prime}$ in length is then so small that it is likely never to occur before the sun runs cold. This means that if the bound $N^{\prime}$ were to be close to the true norm, then this algorithm will in practise never converge to the correct solution. In practise, our bound on the norm is surely very far from sharp, but we gave this example to illustrate the difficulty in guaranteeing convergence (despite the fact that the algorithm might appear to the human eye to have converged).

3. Proof of the main results

3.1. Proof of theorem 1.3

Let $B=\{b_{1},\dots,b_{n}\}$ be a set of vectors in ${\mathbb{Z}}^{r}$ . Following the notation of eq. 1.1.2, we write

[TABLE]

in the ring $R={\mathbb{Z}}[x_{1},\dots,x_{r}]$ . Then ${\mathcal{I}}_{B}=(f_{1},\dots,f_{n})\subseteq R$ , and our goal is to bound the norm of the saturation

[TABLE]

Definition 3.1.

A pure binomial in $R$ is an element of the form $m_{1}-m_{2}$ where the $m_{i}$ are monomials. An ideal $I\subseteq R$ is called pure binomial if it admits a generating set consisting of pure binomials; evidently, ${\mathcal{I}}_{B}$ is a pure binomial ideal.

Lemma 3.2 ([HHO18], proposition 3.18).

The saturation of ${\mathcal{I}}_{B}$ with respect to $x_{1}\cdots x_{r}$ is also a pure binomial ideal.

Definition 3.3.

Given pure binomials $f=f^{+}-f^{-}$ and $g=g^{+}-g^{-}$ , we define the subtraction polynomial (again a pure binomial)

[TABLE]

If $f$ , $g\in{\mathcal{I}}_{B}$ then clearly $S(f,g)$ lies in ${\mathcal{I}}_{B}$ .

We make the unsurprising notational conventions that $--=+$ , $+-=-+=-$ and $++=+$ ; thus we interpret $f^{--}=f^{+}$ , which is less usual, but makes for efficient and hopefully comprehensible notation in what follows.

Definition 3.4.

Let $\epsilon\colon\{1,\dots,n\}\to\{+,-\}$ , and let $t\colon\{1,\dots,n\}\to{\mathbb{N}}$ . Define

[TABLE]

(here we use our ‘ $--=+$ ’ convention when we write $f_{i}^{-\epsilon(i)}$ ).

Lemma 3.5.

Let $P$ be a pure binomial in ${\mathcal{I}}_{B}$ . Then there exist $\epsilon$ , $t$ , and monomials $m$ and $n$ such that

[TABLE]

Proof.

For the purposes of the proof, we will simplify notation by assuming that for every $b_{i}\in B$ , the element $-b_{i}$ also lies in $B$ .

Let $P\in{\mathcal{I}}_{B}$ be a pure binomial. Write $P=\sum_{j=1}^{k}m_{j}f_{i_{j}}$ , where the $m_{j}$ are monomials. We can and do assume that $k$ is chosen minimal, and we proceed by induction on $k$ . The case $k=1$ is trivial.

Up to harmless sign changes, there exists a $j_{0}$ such that $m_{j_{0}}f_{i_{j_{0}}}^{+}=P^{+}$ . Reordering, we may assume that $j_{0}=1$ , so

[TABLE]

is again a pure difference binomial. By the induction hypothesis there exist monomials $m$ and $n$ and vectors $\epsilon$ , $t$ with

[TABLE]

Write $S(\epsilon,t)=S^{+}-S^{-}$ . Then

[TABLE]

This this is a binomial, up to signs we may assume without loss of generality that $nS^{-}=m_{1}f_{i_{1}}^{+}$ . We can then write

[TABLE]

where $S^{\prime}$ is an iterated subtraction binomial of the $f_{i}$ . ∎

Theorem 3.6.

There exist a positive integer $M$ , functions $\epsilon_{1},\dots,\epsilon_{M}$ and $t_{1},\dots,t_{M}$ as in definition 3.4, and monomials $m_{1},\dots,m_{M}\in R$ , such that

(1)

for all $1\leq j\leq M$ we have $m_{j}\mid S(\epsilon_{j},t_{j})$ ; 2. (2)

[TABLE]

Proof.

Combine lemma 3.2 and lemma 3.5. ∎

Given $t\colon\{1,\dots,n\}\to{\mathbb{N}}$ we define the $L^{1}$ -length of $t$ to be the sum of its values. To prove theorem 1.3 it suffices to show that we can choose each of the vectors $t_{j}$ in theorem 3.6 to have $L^{1}$ -length bounded by the constant $N$ of (1.2.2). Given vectors $\epsilon$ of signs and $t$ of natural numbers as in definition 3.4, observe that the power of $x_{j}$ dividing $S(t,\epsilon)$ is given by

[TABLE]

We say the minimum in eq. 3.1.3 is achieved on the $+$ side if

[TABLE]

and we say the minimum in eq. 3.1.3 is achieved on the $-$ side if

[TABLE]

Definition 3.7.

Given $\epsilon\colon\{1,\dots,n\}\to\{+,-\}$ and $\delta\colon\{1,\dots,r\}\to\{+,-\}$ , we define

[TABLE]

This set $T_{\epsilon,\delta}$ is a rational polyhedral cone in ${\mathbb{N}}^{n}$ , and for fixed $\epsilon$ we have

[TABLE]

Given $t\in T_{\epsilon,\delta}$ , we write

[TABLE]

which we write as a difference of monomials $\varphi_{t}=\varphi_{t}^{+}-\varphi_{t}^{-}$ in the usual way. From the definition of $T_{\epsilon,\delta}$ we see that $\varphi_{t}\in R$ , i.e. all exponents of the $x_{i}$ are non-negative.

Lemma 3.8.

Fix $\epsilon$ and $\delta$ as above, and let $t,t_{1},\dots,t_{a}\in T_{\epsilon,\delta}$ such that $t=t_{1}+\cdots+t_{a}$ . Then

[TABLE]

Proof.

Elementary manipulations yield

[TABLE]

Theorem 3.9.

For each $\epsilon$ and each $\delta$ , choose a generating set $\tau_{\epsilon,\delta}$ for the cone $T_{\epsilon,\delta}$ . Then

[TABLE]

is a generating set for $\operatorname{Sat}_{x_{1}\cdots x_{r}}{\mathcal{I}}_{B}$ .

Proof.

Let $t\in{\mathbb{N}}^{n}$ , then $S(\epsilon,t)\in{\mathcal{I}}_{B}$ , and $\varphi_{t}\in R$ , hence by definition of the saturation we see that $\varphi_{t}\in\operatorname{Sat}_{x_{1}\cdots x_{r}}{\mathcal{I}}_{B}$ . Conversely, theorem 3.6 tells us that the $\varphi_{t}$ generate $\operatorname{Sat}_{x_{1}\cdots x_{r}}{\mathcal{I}}_{B}$ as $t$ ranges over ${\mathbb{N}}^{n}$ . We must justify why it suffices to consider only $t$ ranging over the set in (3.1.7). Fixing $\epsilon$ , we note that every $t\in{\mathbb{N}}^{r}$ lies in some $T_{\epsilon,\delta}$ by (3.1.4), and then by lemma 3.8 it suffices to range over elements of a generating set for $T_{\epsilon,\delta}$ . ∎

Fixing $\epsilon$ and $\delta$ , it remains to show that $T_{\epsilon,\delta}$ can be generated by vectors of length bounded by the constant $N$ from (1.2.2). First, we have the elementary

Lemma 3.10.

Let $v_{1},\dots,v_{a}\in{\mathbb{N}}^{n}$ , and let $C$ be the intersection of ${\mathbb{N}}^{n}$ with the rational cone spanned by the $v_{i}$ . Then $C$ is generated by

[TABLE]

Observe that the faces of $T_{\epsilon,\delta}$ are defined by the equations

[TABLE]

thus the extremal rays of $T_{\epsilon,\delta}$ are obtained by solving $n-1$ equations of the form eq. 3.1.8. Let $\beta$ be the maximum of the absolute values of the $\operatorname{ord}_{x_{j}}f_{i}=b_{i,j}$ as $i$ and $j$ vary. By Siegel’s lemma, the $L^{1}$ -length of such a (non-zero) solution is then bounded above by

[TABLE]

From lemma 3.10, and cutting into simplicial cones, we see that $T_{\epsilon,\delta}$ can be generated by vectors of length at most $N=n(2n\beta)^{n-1}$ , concluding the proof.

3.2. Proof of proposition 1.4

Let $G$ be a generating set for the saturation as in definition 1.2. Each $g\in G$ is a pure difference binomial, say $g=x^{c^{+}}-x^{c^{-}}$ with $c^{+}$ , $c^{-}\in{\mathbb{N}}^{r}$ , and can be written in the form

[TABLE]

with $\epsilon_{i}\in\{1,0,-1\}$ , $m_{i}$ monomials, and $f_{j}$ as in section 3.1. Writing $c=c^{+}-c^{-}$ , it suffices (by theorem 1.1) to show that $c$ can be written as $c=\sum_{i=1}^{n}a_{i}b_{i}$ with $\sum_{i=1}^{n}\lvert a_{i}\rvert\leq N$ .

We wish to prove this by induction on $N$ , but this makes no sense as $N$ is the norm. Instead we rephrase things slightly so that induction makes sense:

Lemma 3.11.

Let $M$ be a positive integer, and suppose that the expression

[TABLE]

is a pure binomial $x^{c^{+}}-x^{c^{-}}$ , where $\epsilon_{i}\in\{1,-1\}$ , and the $m_{i}$ are monomials. Then there exist integers $a_{1},\dots,a_{n}$ with $\sum_{i=1}^{n}\lvert a_{i}\rvert\leq M$ and $c^{+}-c^{-}=\sum_{i=1}^{n}a_{i}b_{i}$ .

It is clear that the lemma (applied with $M=N$ ) implies proposition 1.4, so it only remains to verify the lemma.

Proof.

For a warmup we treat first the case $M=1$ . Then

[TABLE]

where we write $m=x^{d}$ for some $d\in{\mathbb{N}}^{r}$ . Hence

[TABLE]

as required.

We prove the general case by induction on $M$ . First, up to changing some signs, observe that we can re-order the terms in the expression eq. 3.2.1 so that $m_{1}f_{j_{1}}^{+}=x^{c^{+}}$ , hence we can assume that $\sum_{i=1}^{M-1}\epsilon_{i}m_{i}f_{j_{i}}$ is also a pure binomial, say

[TABLE]

Then by our induction hypothesis we can write $c^{\prime+}-c^{\prime-}=\sum_{i=1}^{n}a^{\prime}_{i}b_{i}$ with $\sum_{i=1}^{n}\lvert a^{\prime}_{i}\rvert\leq M-1$ . Then

[TABLE]

and we can (again changing some signs, without loss of generality) assume that $\epsilon_{M}=+1$ and that $x^{c^{-}}=m_{M}x^{b_{j_{M}}^{+}}$ . Writing $m_{M}=x^{d}$ , we see

•

$x^{c^{+}}=x^{c^{\prime+}}$ , so $c^{+}=c^{\prime+}$ ;

•

$x^{c^{-}}=x^{d+b_{j_{M}}^{+}}$ , so $c^{-}=d+b_{j_{M}}^{+}$ ;

•

$x^{c^{\prime-}}=m_{M}x^{b_{j_{M}}^{-}}=x^{d+b_{j_{M}}^{-}}$ , so $c^{\prime-}=d+b_{j_{M}}^{-}$ .

Putting these together we see

[TABLE]

from which the result is immediate. ∎

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AHT 12] Satoshi Aoki, Hisayuki Hara, and Akimichi Takemura. Markov bases in algebraic statistics . Springer Series in Statistics. Springer, New York, 2012.
2[CTV 14] Hara Charalambous, Apostolos Thoma, and Marius Vladoiu. Markov complexity of monomial curves. J. Algebra , 417:391–411, 2014.
3[DS 98] Persi Diaconis and Bernd Sturmfels. Algebraic algorithms for sampling from conditional distributions. Ann. Statist. , 26(1):363–397, 1998.
4[GP 13] Elizabeth Gross and Sonja Petrović. Combinatorial degree bound for toric ideals of hypergraphs. Internat. J. Algebra Comput. , 23(6):1503–1520, 2013.
5[HAT 12] Hisayuki Hara, Satoshi Aoki, and Akimichi Takemura. Running Markov chain without Markov basis. In Harmony of Gröbner bases and the modern industrial society , pages 45–62. World Sci. Publ., Hackensack, NJ, 2012.
6[HHO 18] Jürgen Herzog, Takayuki Hibi, and Hidefumi Ohsugi. Binomial ideals , volume 279 of Graduate Texts in Mathematics . Springer, Cham, 2018.
7[H Md CTY 14] David Haws, Abraham Martín del Campo, Akimichi Takemura, and Ruriko Yoshida. Markov degree of the three-state toric homogeneous Markov chain model. Beitr. Algebra Geom. , 55(1):161–188, 2014.
8[HW 15] Raymond Hemmecke and Tobias Windisch. On the connectivity of fiber graphs. J. Algebr. Stat. , 6(1):24–45, 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

The norm of the saturation of a binomial ideal, and applications to Markov bases

Abstract.

Contents

1. Introduction

1.1. Background

1.1.1. A random walk in the fibre

1.1.2. Saturated ideals and connected fibres

Theorem 1.1** (Diaconis-Sturmfels, [DS98]).**

1.1.3. Connected fibres without saturation

1.2. Results

1.2.1. A bound on the complexity of the saturation

Definition 1.2**.**

Theorem 1.3**.**

Proposition 1.4**.**

Remark 1.5*.*

1.2.2. Comparison to other results in the literature

1.2.3. Improving the algorithm of Aoki, Hara and Takemura

1.2.4. An alternative algorithm for connecting the fibres

1.3. Practical consequences

Acknowledgements

2. Examples

2.1. A very simple example

2.2. Families where the fibres are arbitrarily badly connected

2.3. The no-three-factor-interaction model

3. Proof of the main results

3.1. Proof of theorem 1.3

Definition 3.1**.**

Lemma 3.2** ([HHO18], proposition 3.18).**

Definition 3.3**.**

Definition 3.4**.**

Lemma 3.5**.**

Proof.

Theorem 3.6**.**

Proof.

Definition 3.7**.**

Lemma 3.8**.**

Proof.

Theorem 3.9**.**

Proof.

Lemma 3.10**.**

3.2. Proof of proposition 1.4

Lemma 3.11**.**

Proof.

Theorem 1.1 (Diaconis-Sturmfels, [DS98]).

Definition 1.2.

Theorem 1.3.

Proposition 1.4.

*Remark 1.5**.*

Definition 3.1.

Lemma 3.2 ([HHO18], proposition 3.18).

Definition 3.3.

Definition 3.4.

Lemma 3.5.

Theorem 3.6.

Definition 3.7.

Lemma 3.8.

Theorem 3.9.

Lemma 3.10.

Lemma 3.11.