The norm of the saturation of a binomial ideal, and applications to Markov bases
David Holmes

TL;DR
This paper introduces a new measure called the norm to quantify the complexity of saturations of binomial ideals, providing bounds and exploring applications in statistics.
Contribution
It defines the norm of the saturation of binomial ideals and establishes bounds based on computable invariants, with applications to statistical models.
Findings
Bound on the norm in terms of ideal invariants
Application to statistical models and Markov bases
New measure for ideal saturation complexity
Abstract
Given a pure binomial ideal I in variables x_i, we define a new measure of the complexity of the saturation of I with respect to the product of the variables x_i, which we call the norm. We give a bound on the norm in terms of easily-computed invariants of the ideal. We discuss statistical applications both practical and theoretical.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The norm of the saturation of a binomial ideal, and applications to Markov bases
David Holmes
Abstract.
Given a pure binomial ideal in variables , we define a new measure of the complexity of the saturation of with respect to the product of the , which we call the norm. We give a bound on the norm in terms of easily-computed invariants of the ideal. We discuss statistical applications both practical and theoretical.
Contents
1. Introduction
1.1. Background
Let be a matrix with integer entries, and let be a vector with non-negative entries. The fibre containing is defined as
[TABLE]
Understanding the structure of this fibre is important in a number of statistical tests. For example, the vectors in might represent tables of data, and the matrix might output the row and column sums of these tables, so the fibre consists of all tables with non-negative entries and with the same row and column sums as the starting table . See [DS98] for more details and examples. In particular, one often wants to generate samples from some probability distribution (often uniform or hypergeometric) on the fibre. If the fibre is small it is practical to simply enumerate all the elements of the fibre. However, in practical applications the fibre is often far too large to enumerate, and the standard approach is to perform a random walk in the fibre, generating samples via the Metropolis-Hastings Markov-Chain Monte-Carlo algorithm. In order to perform a random work, we must upgrade the fibre into a graph (whose vertices are the elements of the fibre). The requirements for the Metropolis-Hastings algorithm are rather mild, the key condition is that the graph must be connected (since the random walk will always remain within its starting connected component).
1.1.1. A random walk in the fibre
The most naive way to convert the fibre into a graph is to choose a generating set for the kernel of , and then form a (simple, undirected) graph by putting an edge between distinct vertices and whenever or . We say is connected by if the resulting graph is connected. In section 2 we will see several examples of that fail to connect . The major innovation of Diaconis and Sturmfels [DS98] was to give an algorithm to construct a generating set which connects every fibre of a given matrix .
1.1.2. Saturated ideals and connected fibres
To describe their result, we need a little more notation. Given , we write , both summands having non-negative entries. In the ring we form the elements
[TABLE]
and define an ideal . Then the key theorem is:
Theorem 1.1** (Diaconis-Sturmfels, [DS98]).**
Fix a matrix , and let be a generating set for the integral kernel of . Suppose the ideal is saturated with respect to the element . Then for every , the fibre is connected by .
If is saturated, is often called a Markov basis (this should not be interpreted as implying linear independence of the elements of ). The theorem then tells us that we can generate samples according to our preferred distribution by following the naive random walk algorithm above using the basis .
On the other hand, suppose that we have a generating set such that is not saturated. We can (at least in principal) apply a standard saturation algorithm to to produce a saturated ideal, and moreover the generating set produced will in fact consist of pure difference binomials (i.e. differences of monomials; see definition 3.1). Reversing the procedure eq. 1.1.2 we can recover a new generating set for the kernel of , and following the above theorem of Diaconis-Sturmfels, this generating set will connect all fibres, enabling efficient sampling.
Thus, when it is possible to compute this saturation, the problem is essentially solved. However, the standard algorithm for saturation involves the computation of Gröbner bases, and is at present only practical for relatively small examples (software to carry out such computations can be found at 4ti2.de).
1.1.3. Connected fibres without saturation
The difficulty of computing the saturation motivated Aoki, Hara and Takemura [HAT12] to suggest an algorithm for generating samples without needing to compute the saturation. They begin in the same say, with a generating set for the integral kernel, but instead of making moves consisting of addition or subtraction of a single element of , they instead generate non-negative integers from a Poisson distribution with some chosen mean , and elements , and their move consists of addition of if the result lies in the fibre, and staying put otherwise. Since the Poisson distribution generates every non-negative integer with non-zero probability it is immediate that the resulting fibre is connected; in fact, the graph on the fibre is a complete graph, but with highly non-uniform probability of selecting moves from among edges.
They then perform a number of numerical experiments with various values of . In cases where it was possible to compute the saturation, they show that for careful choice of their algorithm performs comparably to that coming from a Markov basis, and they also illustrate that their algorithm can be applied in cases where the saturation is too hard to compute (though they can of course provide no guarantee that their algorithm is converging in reasonable time; it appears to do so, but this might be deceptive if the fibre has some connected components that are very hard to hit — see section 1.3).
There is some tension in the use of this algorithm when it comes to choosing the value of . If one chooses very large then the algorithm takes a long time before it (appears to) converge. On the other hand, a small value of will product more rapid apparent convergence, but there is a greater risk that one is simply failing to see one or more connected components of the fibre in the time for which the algorithm is run.
1.2. Results
1.2.1. A bound on the complexity of the saturation
In the light of the above discussion it is natural to try to bound how large and complex the saturation of the ideal can get. To make this more precise, we define the norm of the saturation:
Definition 1.2**.**
Let be set of vectors in . We write for the ideal in as defined in section 1.1.2. The norm of is the smallest integer such that there exists a finite generating set for the saturation of with respect to , with the properties that
- (1)
Every element of is a pure difference binomial; 2. (2)
Every can be written in the form
[TABLE]
where the , the are monomials, and the are elements of .
The main result of this paper is the following explicit bound on the norm. In sections 1.2.3 and 1.2.4 we will show how this can be applied to give new algorithms for sampling from fibres without needing to compute the saturation.
Theorem 1.3**.**
Let be set of vectors in . Write for the maximum of the absolute values of the coefficients of elements of . Then the norm of is at most
[TABLE]
Our proof (see section 3.1)is constructive; it gives an algorithm to determine a generating set as in the definition of the norm. We do not know whether this algorithm could be practical; it is a-priori less efficient than procedures using Gröbner bases, but is highly parallelisable.
The connection of the norm to fibre connectivity and Markov chains runs via the following result (proven in section 3.2):
Proposition 1.4**.**
Let be a integer matrix, and a basis of the kernel, with having norm . Let , and construct a graph with vertex set the fibre , and where we draw an edge from to if and only if can be written as an integer linear combination
[TABLE]
with . Then this graph is connected.
Remark 1.5*.*
Given a integral matrix , note that it is easy to compute a basis of the integral kernel of from the Smith normal form of . Indeed, if is the Smith normal form (so and are invertible, and diagonal with ), then let be maximal such that . Then an integral basis of the kernel of is given by , where is the th standard basis vector in .
Conversely, while does not determine , it does determine the fibres , so the matrix is not really essential, but is very relevant to the statistical applications.
1.2.2. Comparison to other results in the literature
Needless to say, we are not the first to try to control the complexity of the saturation of an ideal in a polynomial ring. Indeed, the standard method of computing the saturation reduces to a Gröbner basis computation, whose efficient implementation has been the focus of too much research to begin to list here. Specialising to the case of binomial ideals, the literature is still much too large to give more than a quick glimpse of. There are general theoretical results on the structure of fibre graphs ([Win16], [HW15], [Win19], [GP13], …). There are also many results bounding the degree of the binomials appearing in the saturation, see ([Stu96, chapter 13], [HMdCTY14], [KOT15], …), and bounding the Markov complexity; this is defined in [SS03], and studied in [CTV14] and elsewhere.
However, we are not aware on bounds on the ‘norm’ (definition 1.2) in the literature. Indeed, from an algebraic point of view it appears a rather unnatural invariant. The reason for studying it comes purely from the application (via proposition 1.4) to fibre connectivity and Markov bases. In the remainder of section 1 we hope to justify it from this point of view, and perhaps motivate further research in this direction. An unusual feature of our results is that we do not utilise Gröbner bases of related techniques; this is not from dislike, but simply because we could not see how to bound the norm from that perspective; we hope that others may have more success.
1.2.3. Improving the algorithm of Aoki, Hara and Takemura
Aoki, Hara and Takemura connect the fibre by allowing arbitrarily large integer linear combinations of elements of the basis . However, proposition 1.4 shows that it suffices to take combinations with coefficients bounded by the norm of . This allows us to improve the efficiency of their algorithm, by truncating the Poisson distribution at . A second algorithm they present (where the coefficients of the are chosen from a multinomial distribution) can be enhanced in a similar way. The bound on the norm coming from theorem 1.3 is likely to be large in comparison with the chosen , so will not have a large impact on the runtime, but we hope that better bounds on the norm can be found in future.
A more useful application might be to predicting good values of the constant in their algorithm, or giving heuristic bounds on the convergence time for a given value of . The norm can be seen as the maximum distance between connected components of the fibre, thus to be have a reasonable chance of hitting all components we should take a number of steps that is very large compared to .
1.2.4. An alternative algorithm for connecting the fibres
In the naive algorithm of section 1.1.1, one starts at a vector , and chooses at random an element , and considers the step . If in then this is returned as the next element of the Markov chain. If , then the algorithm simply returns . However, if we have a bound on the norm then we can modify the algorithm so that the fibre will always be connected; if then, rather than returning , we choose another element from , and consider the vector . If lies in we return is as the next step in the Markov chain, otherwise we repeat, until we either hit again, or we have taken consecutive steps outside the fibre, in which case we return again. Alternatively, this can be viewed as a weighted random walk in a certain graph with vertex set . To define this graph, we first define a graph with vertex set and with an edge between and whenever . Then we define a graph with vertex set by putting an edge between two vertices whenever they can be connected by a path in of length at most , and which does not intersect except at its endpoints. Again, by proposition 1.4 this new graphs is guaranteed to be connected.
More generally, with theorems 1.3 and 1.4 in hand it is easy to propose new sampling algorithms which guarantee to connect the fibre. The challenge is to design algorithms with reasonable runtime, at least heuristically (rigorous runtime analysis seems hard but very interesting).
If the fibre is large with respect to the norm then designing reasonably efficient algorithms is not hard, since the runtime will be dominated by time spent in the ‘interior’ of the fibre. On the other hand, if the fibre is small compared to then the runtime will be dominated by time spent around the edge of the fibre looking for new connected components, and will depend sensitively on the norm (or more precisely, on our bound on the norm).
1.3. Practical consequences
The algorithm of section 1.1.3 is proven to converge. And in practise the Markov chain is often observed to settle down quite fast. Indeed, in practise it is the latter which will generally be relied upon; people run algorithms until the chain appears to converge. However, there is a critical problem here. Namely, we see in section 2.2 examples where the chain will appear to converge very rapidly, but this ‘apparent’ limit will not be the true limit (the runtime required to achieve true convergence may easily be arranged to exceed the lifespan of the solar system). We hope that this kind of pathological behaviour will be very rare in practise, but at present this seems hard to verify. Our aim in this paper is to get an idea of how long the algorithm should be run in order to be reasonably confident that the ‘apparent limit’ of the chain is in fact the true limit. We are not completely successful in this, partly because our bound on the norm is rather large for practical use (and probably not sharp), and also because passing from the bound in theorem 1.3 to an estimate on the convergence time needs substantial further work. We think it is interesting and useful to investigate this further. In the meantime, we would encourage people this type of algorithm to let it run for as long as possible, even after the chain appears to have settled down, to maximise the change of hitting new connected components.
Acknowledgements
This work owes its existence to a seminar on algebraic statistics organised in Leiden in the Autumn of 2018 by Garnet Akeyr, Rianne de Heide, and Rosa Winter. I am very grateful to them for organising it, for the many expert speakers who took the time to patiently explain basic ideas of probability and statistics to us, and especially to the determined participants who survived to the end, and offered very useful comments on a presentation of the results contained here.
2. Examples
2.1. A very simple example
Consider the matrix
[TABLE]
An integral basis for the kernel of is then given by where
[TABLE]
The fibre containing the vector is illustrated in fig. 1, where red arrows correspond to addition of , and blue arrows to addition of . Evidently, this fibre is not connected, since the element is isolated. Thus is our chain begins anywhere in the large component it will never hit the isolated vertex, and if it begins at the isolated vertex it will remain there. This has practical consequences, since it is common to simply run such a Markov chain until it appears (by eye) to have converged; in this example, convergence will be rapid, but the resulting distribution will not be the expected one (c.f. section 1.3).
The approach of Diaconis-Sturmfels is to replace the basis by a larger generating set which makes the fibre connected. The ideal is generated by and , and its saturation can be generated by these two polynomials together with the polynomial , the latter corresponding to the vector . Clearly one can step from to by addition of this new vector, so the fibre is indeed connected by this new generating set for the integral kernel of .
Our approach is to allow the chain to step briefly outside the fibre while it hunts for vectors with non-negative entries. As long as we allow two negative steps the fibre will become connected, as we can step from to via or ; one sees easily that the norm is . Let us compute the bound resulting from theorem 1.3: we have and , so our bound is . Thus if we use the bound from the theorem we should allow 16 negative steps; it is clear that this will be sufficient to connect the fibre, but also that this bound is not optimal.
2.2. Families where the fibres are arbitrarily badly connected
Consider the matrix , and write for the th standard basis vector in . Let . Then the fibre . For a positive integer , choose the basis
[TABLE]
of the kernel of . Then the fibre consists of two connected components, namely and . Moreover, to step between the connected components requires consecutive negative steps. Thus for every positive integer and every real number there exists an integer such that the algorithm of Aoki, Hara and Takemura presented in section 1.1.3 applied to the above basis will appear to converge immediately, but will take steps before the probability of hitting the other connected component rises above any given positive threshold.
This example is quite artificial, as the fibre is essentially simple, but we have made a poor choice of generating set . We can also construct a slightly less artificial example of the same phenomenon, by generalising the example in section 2.1. For an integer , let
[TABLE]
and consider the basis of the integral kernel given by
[TABLE]
where we denote the elements of by in the given order. Then the fibre of contains the vector . This vector is at least steps distant from any other point in the fibre; more precisely, if are such that
[TABLE]
then either or (the bound is in fact sharp). We leave the elementary verification to the interested reader. Again we see that, though the algorithm of section 1.1.3 (and variants) may appear to converge rapidly, there are connected components which take an arbitrarily long time to hit.
2.3. The no-three-factor-interaction model
This model is described in detail (in particular, its statistical interpretation) in [AHT12]. It depends on a choice of three positive integers , and ; we will often take for simplicity. The matrix is then an matrix, described in a slightly complicated way. Define to be the identity matrix, and to be a row vector of length with all entries equal to . Then
[TABLE]
where represents the Kroneker product of matrices.
In [HAT12], the authors numerically test their algorithm (section 1.1.3) on the no-three-factor-interaction model in the cases and . In the case the saturation can be computed by Gröbner basis techniques, but seems presently out of reach , and worse for . In each case they compute a basis for the integral kernel, then run numerical tests of their algorithm for several values of the Poisson parameter , and also occasionally replacing the Poisson with a different distribution (we are not completely clear on how they chose these parameters and distributions). In the case they compare their results to those obtained using a saturated basis, and observe that the Markov chains coming from their algorithm converge similarly to those coming from a saturated basis (though for the convergence is rather slow).
For their algorithm does not converge well, but for it appears to converge fairly rapidly. As throughout this paper, the question we are interested in is whether this apparent convergence can be trusted, or is it possible that there is some connected component of the fibre which their chain has never hit? Of course, their algorithm will find every component with probability 1 if allowed to run for unlimited time, but there is no a-priori reason to assume that the time required for this will be in any way comparable to the time required for the chain to appear to settle down.
To try to get a handle on this, let us compute our upper bound on the number of negative steps required to walk between components (the ‘distance between’ connected components of the fibre). Using SAGE we compute the smith normal form of the matrix , obtaining an integral basis with elements. The largest absolute value of an entry in is . This leads to an upper bound on the norm by
[TABLE]
Now, in this example Aoki, Hara and Takemura replace the Poisson distribution with a geometric distribution (for reasons which are unclear to us), and try parameters . The proportion of steps in their algorithm which will exceed in length is then so small that it is likely never to occur before the sun runs cold. This means that if the bound were to be close to the true norm, then this algorithm will in practise never converge to the correct solution. In practise, our bound on the norm is surely very far from sharp, but we gave this example to illustrate the difficulty in guaranteeing convergence (despite the fact that the algorithm might appear to the human eye to have converged).
3. Proof of the main results
3.1. Proof of theorem 1.3
Let be a set of vectors in . Following the notation of eq. 1.1.2, we write
[TABLE]
in the ring . Then , and our goal is to bound the norm of the saturation
[TABLE]
Definition 3.1**.**
A pure binomial in is an element of the form where the are monomials. An ideal is called pure binomial if it admits a generating set consisting of pure binomials; evidently, is a pure binomial ideal.
Lemma 3.2** ([HHO18], proposition 3.18).**
The saturation of with respect to is also a pure binomial ideal.
Definition 3.3**.**
Given pure binomials and , we define the subtraction polynomial (again a pure binomial)
[TABLE]
If , then clearly lies in .
We make the unsurprising notational conventions that , and ; thus we interpret , which is less usual, but makes for efficient and hopefully comprehensible notation in what follows.
Definition 3.4**.**
Let , and let . Define
[TABLE]
(here we use our ‘’ convention when we write ).
Lemma 3.5**.**
Let be a pure binomial in . Then there exist , , and monomials and such that
[TABLE]
Proof.
For the purposes of the proof, we will simplify notation by assuming that for every , the element also lies in .
Let be a pure binomial. Write , where the are monomials. We can and do assume that is chosen minimal, and we proceed by induction on . The case is trivial.
Up to harmless sign changes, there exists a such that . Reordering, we may assume that , so
[TABLE]
is again a pure difference binomial. By the induction hypothesis there exist monomials and and vectors , with
[TABLE]
Write . Then
[TABLE]
This this is a binomial, up to signs we may assume without loss of generality that . We can then write
[TABLE]
where is an iterated subtraction binomial of the . ∎
Theorem 3.6**.**
There exist a positive integer , functions and as in definition 3.4, and monomials , such that
- (1)
for all we have ; 2. (2)
[TABLE]
Proof.
Combine lemma 3.2 and lemma 3.5. ∎
Given we define the -length of to be the sum of its values. To prove theorem 1.3 it suffices to show that we can choose each of the vectors in theorem 3.6 to have -length bounded by the constant of (1.2.2). Given vectors of signs and of natural numbers as in definition 3.4, observe that the power of dividing is given by
[TABLE]
We say the minimum in eq. 3.1.3 is achieved on the side if
[TABLE]
and we say the minimum in eq. 3.1.3 is achieved on the side if
[TABLE]
Definition 3.7**.**
Given and , we define
[TABLE]
This set is a rational polyhedral cone in , and for fixed we have
[TABLE]
Given , we write
[TABLE]
which we write as a difference of monomials in the usual way. From the definition of we see that , i.e. all exponents of the are non-negative.
Lemma 3.8**.**
Fix and as above, and let such that . Then
[TABLE]
Proof.
Elementary manipulations yield
[TABLE]
Theorem 3.9**.**
For each and each , choose a generating set for the cone . Then
[TABLE]
is a generating set for .
Proof.
Let , then , and , hence by definition of the saturation we see that . Conversely, theorem 3.6 tells us that the generate as ranges over . We must justify why it suffices to consider only ranging over the set in (3.1.7). Fixing , we note that every lies in some by (3.1.4), and then by lemma 3.8 it suffices to range over elements of a generating set for . ∎
Fixing and , it remains to show that can be generated by vectors of length bounded by the constant from (1.2.2). First, we have the elementary
Lemma 3.10**.**
Let , and let be the intersection of with the rational cone spanned by the . Then is generated by
[TABLE]
Observe that the faces of are defined by the equations
[TABLE]
thus the extremal rays of are obtained by solving equations of the form eq. 3.1.8. Let be the maximum of the absolute values of the as and vary. By Siegel’s lemma, the -length of such a (non-zero) solution is then bounded above by
[TABLE]
From lemma 3.10, and cutting into simplicial cones, we see that can be generated by vectors of length at most , concluding the proof.
3.2. Proof of proposition 1.4
Let be a generating set for the saturation as in definition 1.2. Each is a pure difference binomial, say with , , and can be written in the form
[TABLE]
with , monomials, and as in section 3.1. Writing , it suffices (by theorem 1.1) to show that can be written as with .
We wish to prove this by induction on , but this makes no sense as is the norm. Instead we rephrase things slightly so that induction makes sense:
Lemma 3.11**.**
Let be a positive integer, and suppose that the expression
[TABLE]
is a pure binomial , where , and the are monomials. Then there exist integers with and .
It is clear that the lemma (applied with ) implies proposition 1.4, so it only remains to verify the lemma.
Proof.
For a warmup we treat first the case . Then
[TABLE]
where we write for some . Hence
[TABLE]
as required.
We prove the general case by induction on . First, up to changing some signs, observe that we can re-order the terms in the expression eq. 3.2.1 so that , hence we can assume that is also a pure binomial, say
[TABLE]
Then by our induction hypothesis we can write with . Then
[TABLE]
and we can (again changing some signs, without loss of generality) assume that and that . Writing , we see
- •
, so ;
- •
, so ;
- •
, so .
Putting these together we see
[TABLE]
from which the result is immediate. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AHT 12] Satoshi Aoki, Hisayuki Hara, and Akimichi Takemura. Markov bases in algebraic statistics . Springer Series in Statistics. Springer, New York, 2012.
- 2[CTV 14] Hara Charalambous, Apostolos Thoma, and Marius Vladoiu. Markov complexity of monomial curves. J. Algebra , 417:391–411, 2014.
- 3[DS 98] Persi Diaconis and Bernd Sturmfels. Algebraic algorithms for sampling from conditional distributions. Ann. Statist. , 26(1):363–397, 1998.
- 4[GP 13] Elizabeth Gross and Sonja Petrović. Combinatorial degree bound for toric ideals of hypergraphs. Internat. J. Algebra Comput. , 23(6):1503–1520, 2013.
- 5[HAT 12] Hisayuki Hara, Satoshi Aoki, and Akimichi Takemura. Running Markov chain without Markov basis. In Harmony of Gröbner bases and the modern industrial society , pages 45–62. World Sci. Publ., Hackensack, NJ, 2012.
- 6[HHO 18] Jürgen Herzog, Takayuki Hibi, and Hidefumi Ohsugi. Binomial ideals , volume 279 of Graduate Texts in Mathematics . Springer, Cham, 2018.
- 7[H Md CTY 14] David Haws, Abraham Martín del Campo, Akimichi Takemura, and Ruriko Yoshida. Markov degree of the three-state toric homogeneous Markov chain model. Beitr. Algebra Geom. , 55(1):161–188, 2014.
- 8[HW 15] Raymond Hemmecke and Tobias Windisch. On the connectivity of fiber graphs. J. Algebr. Stat. , 6(1):24–45, 2015.
