Homogeneous length functions on Groups: Intertwined computer & human proofs
Siddhartha Gadgil

TL;DR
This paper discusses a unique collaboration between human mathematicians and computer-generated proofs, leading to the discovery of a significant mathematical result through an iterative process of understanding and abstraction.
Contribution
It introduces a novel proof methodology combining computer-generated and human-readable proofs to facilitate mathematical discovery.
Findings
Computer-assisted proofs can be effectively understood and generalized by humans.
The interplay between human insight and computer proofs can lead to new mathematical results.
A key lemma was derived through this collaborative proof process.
Abstract
We describe a case of an interplay between human and computer proving which played a role in the discovery of an interesting mathematical result. The unusual feature of the use of computers here was that a computer generated but human readable proof was read, understood, generalized and abstracted by mathematicians to obtain the key lemma in an interesting mathematical result.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematics and Applications
\setlistdepth
7
Homogeneous length functions on Groups:
Intertwined computer & human proofs
Siddhartha Gadgil
Department of Mathematics,
Indian Institute of Science,
Bangalore 560012, India
Abstract.
We describe a case of an interplay between human and computer proving which played a role in the discovery of an interesting mathematical result [4]. The unusual feature of the use of computers here was that a computer generated but human readable proof was read, understood, generalized and abstracted by mathematicians to obtain the key lemma in an interesting mathematical result.
Key words and phrases:
type theory; homotopy type theory; geometric group theory
2010 Mathematics Subject Classification:
03B15 (primary), 20F12, 20F65 (secondary)
1. Introduction
Computers have come to play many roles in mathematical proofs. Computer experimentation is commonly used to make conjectures and computer algebra systems are used for sophisticated calculations. Components of proofs of important results have also been provided by computers. Such rigorous computer proofs often generate independently verifiable proof certificates. Proof assistants have been used to formalize proofs, including some very complex ones.
Here we describe a case different from these – where a computer generated but human readable proof was read, understood, generalized and abstracted by mathematicians to obtain the key lemma in a significant mathematical result. So far as we know this is the only such instance so far.
The result we discuss concerned a question about the existence of so called homogeneous length functions, which was asked by Terrence Tao on his blog (Apoorva Khare had asked Tao this question). The question was answered in six days in a collaboration that became PolyMath 14111Participants: T. Fritz, S. Gadgil, A. Khare, P. Nielsen, L. Silberman, T. Tao., and the answer (and stronger results) have been published in [4].
To state the main question, we need some definitions. We emphasise that in general the groups we consider are not abelian (commutative), i.e., if , we may have . Thus the notation we use is multiplicative, similar to that for matrix multiplication (except with the identity denoted as rather than ). Recall that, for fixed , invertible matrices form a group.
We sometimes denote the product of and as instead of for readability.
Definition 1.1**.**
A pseudo-length function on a group is a function such that
- •
, where is the identity.
- •
for all (symmetry).
- •
for all (the triangle inequality).
Definition 1.2** (Conjugacy invariance).**
A pseudo-length function on a group is said to be conjugacy invariant if for all .
Recall that elements are conjugate if there exists such that . Conjugacy invariance is thus the property that conjugate elements have equal lengths. Note that in an abelian group, conjugate elements are equal, so this property is automatically satisfied.
Definition 1.3** (Homogeneity).**
A pseudo-length function on a group is said to be homogeneous if for all , .
Definition 1.4** (Positivity).**
A pseudo-length function on a group is said to be a length function if for all .
If is the additive group of a vector space over , then a norm on gives a homogeneous, conjugacy invariant length function. For example on both and are homogeneous, conjugacy invariant length functions. To see this, note that the properties of a pseudo-length follow from the definition of norms. As mentioned above, conjugacy invariance is automatic as additive groups of vector spaces are abelian.
It was generalizing norms on Vector Spaces that motivated the main question (we elaborate on this after stating Question 1.5). The question was formulated in terms of free groups as these are the prototypical non-abelian groups.
Recall that the free group on generators and is the group whose elements are equivalence classes of words in , , , , where we think of and as simply formal symbols (we will see that in their equivalence classes are inverses of the equivalence classes of and ). Namely, we define an equivalence relation so that two words equivalent if and only if they are related by a sequence of moves given by cancellation of pairs of adjacent letters that are inverses of each other and its inverse move, namely inserting a cancelling pair of letters. For example, in , as cancelling the second and third letters of gives . Conversely, inserting between the first and second letters of gives .
Formally, we consider the equivalence relation on words in generated by
[TABLE]
where , and . The case corresponds to prepending a cancelling pair, and to appending a cancelling pair. The case (which forces ) corresponds to the empty word. The elements of are equivalence classes under this equivalence relation.
Multiplication in is given by concatenation, i.e.
[TABLE]
where denotes the group multiplication. More formally, concatenation of words induces a well-defined multiplication on equivalence classes under of words. The identity is the empty word (more formally the equivalence class of ), and the inverse of an element is obtained by inverting letters and reversing the order, i.e., .
We can now state the main question that was studied.
Question 1.5**.**
Is there a homogeneous, conjugacy-invariant length function on the free group on generators?
Khare asked this question motivated by wanting to generalize results of Khare-Rajaratnam [2][3] from vector spaces with norms to a more general context where commutativity was no longer assumed. However, it was not clear whether any (additional) examples would satisfy this more general hypothesis. The free group was taken as a prototypical group which is not abelian. In fact the results of [4] show that, in a strong sense, the only groups having homogeneous, conjugacy invariant length functions are abelian groups, and all such functions are restrictions of norms to subgroups of vector spaces.
The question is also natural from the point of view of Geometric group theory, where length functions are a central concept and conjugacy invariance of lengths (which corresponds to bi-invariance of metrics) is also commonly studied. Lengths satisfying the additional condition of homogeneity were not much studied previously – which we now know is because there are no interesting examples (except restrictions of norms on vector spaces, which are well understood).
2. Homogeneous length functions and the Internal repetition trick
We now describe the history and some ingredients of the solution Question 1.5.
It is natural to view Question 1.5 as asking whether a homogeneous, conjugacy-invariant pseudo-length function on can also be positive, hence a length function. Further, we can normalize to assume that for (hence, by symmetry, for ). We shall say is normalized if for .
After the failure of various constructions (by day ), the following conjecture seemed likely.
Conjecture 2.1**.**
For any homogeneous, conjugacy-invariant pseudo-length function on , we have .
In particular, this conjecture implies that cannot be a length function. Note that it is natural to focus on the element as a group is abelian if and only if , and we were trying to understand whether there are non-abelian examples of groups with length functions with the desired properties.
Several bounds on were obtained from the hypothesis, giving bounds that even went below . However, these methods appeared to stagnate with the best bound obtained a little above .
Using computer-assistance, we obtained and posted a human readable proof showing . An extract of this proof is below222https://github.com/siddhartha-gadgil/Superficial/wiki/A-commutator-bound for the full proof as originally posted.. Note that we have used somewhat different notation here – the generators are and and their inverses are denoted and . We remark that a fully expanded proof actually had over 2000 lines, but avoiding duplication gave the posted lines.
- •
- •
using
- •
- •
using
- •
using and
- •
… (119 lines)
- •
using and
- •
using
by
taking 17th power.
This proof was studied, understood and generalized by Pace Nielsen, who called the method the internal repetition trick. After several improvements due to Nielsen and Tobias Fritz, this was abstracted by Terrence Tao as the following lemma. Note that this holds for all conjugacy-invariant, homogeneous pseudo-lengths on all groups .
Lemma 2.2**.**
Let , , , in be such that is conjugate to both and , i.e., there exist elements such that . Then one has
[TABLE]
Fritz used this to obtain the key lemma.
Lemma 2.3**.**
Let . Then
[TABLE]
We apply this lemma to , . An argument based on probability theory, due to Tao, showed that . This in particular answered Question 1.5 (the main result proved in [4] is actually stronger than the Theorem 2.4).
Theorem 2.4** (see [4]).**
For every homogeneous, conjugacy-invariant pseudo-length function , we have . In particular is not a length function.
We mention some of the ingredients in the proof of Theorem 2.4 using Lemma 2.3 with and . Consider a random walk on points where we move to (i.e., one step to the left) with probability and to (i.e, diagonally down and right) with probability . Then Lemma 2.3 says that is at most the average value of after one step, and hence inductively after steps for . Also observe that we move on an average a step downwards (the left and right movements cancel on an average). Hence if we start at a point , (where ) the distribution after steps is centered around the origin, and is bounded by the average value of on this distribution. This average in turn can be bounded using the Chebyshev inequality (as in the proof of the law of large numbers) together with the observation that if is normalized (the latter follows by a straightforward inductive use of the triangle inequality and conjugacy invariance). The bound thus obtained is of the form for some constant . Finally, as homogeneity gives , taking a limit as gives .
3. The Algorithms
Our proof was generated by a mixture of algorithms and expert guidance (with some arbitrary choices). More precisely, given certain auxiliary choices, a deterministic algorithm gave upper bounds such that for all normalized, homogeneous, conjugacy-invariant pseudo-length functions and for all .
The auxiliary choices were a finite sequence of pairs , with and . We used homogeneity only for these pairs. Thus, our algorithm actually gives an upper bound for all functions such that
- •
is a normalized, conjugacy-invariant length function on .
- •
for all pairs .
We shall call such pairs homogeneity pairs and a sequence of homogeneity pairs a homogeneity pair sequence. Explicit choices for such pairs that give a proof similar to the posted one are given in 3.7, along with links to a script to replicate this (which runs in under 10 seconds on a moderately powerful laptop/desktop). We discuss in 3.8 how plausible it is to have arrived at such choices through general principles, without expert guidance.
We used a deterministic algorithm (depending on a homogeneity pair sequence) to obtain upper bounds with for all lengths as above and for all . Using this, we computed the bound
[TABLE]
This (after keeping track of inequalities and rendering in human readable form) was the posted proof.
All pseudo-lengths we consider henceforth will be assumed to be normalized and conjugacy-invariant (but not necessarily homogeneous).
3.1. Maximal homogeneous pseudo-lengths
It is convenient to reformulate our main problem using a standard construction. Namely, we define a function by defining, for , to be
[TABLE]
It is well known that this is well-defined and gives the maximal normalized, homogeneous, conjugacy-invariant pseudo-length on . Thus, our main problem is equivalent to finding upper bounds for , in particular for .
3.2. Bounding conjugacy-invariant pseudo-lengths
We now make analogous constructions dropping the homogeneity condition. Let be the set of all normalized, conjugacy-invariant pseudo-length functions on . We define, for ,
[TABLE]
This is well-defined and gives the maximal normalized, conjugacy-invariant pseudo-length on (i.e., the maximal element of ). Further, clearly , so upper bounds on give ones on .
We describe in Section 3.5 an algorithm to obtain an upper bound for . Indeed this bound is sharp, i.e., we have for all (we do not prove or use this, but this fact motivated our approach). Here and henceforth we follow the convention that we use with subscripts to denote pseudo-lengths we wish to bound (whose definition may be non-constructive) and with the same subscript to denote algorithmic upper bounds for these lengths.
3.3. Conjugacy-invariant lengths with elementary bounds
Next, suppose we are given a finite set of pairs , , with and , (we call this a set of elementary bounds). We consider a refinement of and a corresponding modified algorithm (our definitions and algorithms do not depend on the order of the pairs ).
Namely, for , we define
[TABLE]
The function is a normalized, conjugacy-invariant pseudo-length on which is maximal among such lengths that satisfy the additional bounds for all .
Note that for . A straightforward modification of the algorithm describing gives an algorithm giving bounds such that for all . We remark that the bound given by this algorithm is not optimal.333Indeed, an optimal algorithm for for general finite gives a solution to the word problem for groups, which is known to be algorithmically undecidable. Namely, given relations , , …, let be the set . Then if and only if is trivial in the group , .
We shall say that the set of elementary bounds is admissible if for all , . Observe that we can algorithmically obtain an admissible set of elementary bounds from a homogeneity pair sequence , , by setting as .
Note that if a set of elementary bounds is admissible, then for all . Hence gives an upper bound for . We use such a bound, but with the process of obtaining elementary bounds from a homogeneity pair sequences a refinement of setting (and depending on the order of the pairs).
3.4. Bounds with homogeneity pair sequences
In this section we describe algorithms depending on a homogeneity pair sequence in terms of algorithms depending on elementary bounds, essentially by deducing elementary bounds using homogeneity. The algorithms depending on elementary bounds are described in 3.5, which the reader may prefer to read first. In 3.6 we describe how to modify the algorithms of 3.5 along the lines described below.
Assume that we are given a homogeneity pair sequence, i.e., a finite sequence of pairs , . We define inductively in (simultaneously)
- •
an elementary bound (with the element from the given homogeneity pair sequences),
- •
a length function , such that for all , and
- •
An algorithmically defined length function , , such that for all .
First, let . Let , which we recall can be computed algorithmically (as described in 3.5).
Next, let and define . By homogeneity, , so by maximality of , for all .
Recall that we have an algorithm (described in 3.5) giving (for ) an upper bound for . Define for .
Continuing in this fashion define
- •
(which can be algorithmically computed),
- •
, and
- •
(which is algorithmic).
As before, we have the bounds for all .
Inductively, given , for , a function , and an algorithmically defined function , define
- •
(which can be algorithmically computed),
- •
, and
- •
The function is the desired upper bound for .
3.5. Algorithm for conjugacy-invariant pseudo-lengths
We now describe the algorithms giving and , i.e. giving upper bounds for and . Recall that .
Let be a normalized, conjugacy-invariant pseudo-length , which may also be assumed to satisfy a finite number of elementary bounds. We describe an upper bound for for a word recursively in the length of the word. The key ingredient is the following lemma bounding in terms of bounds on shorter words.
Lemma 3.1**.**
Let with .
- (a)
** 2. (b)
If , then
Proof.
To see (a), observe that
[TABLE]
as claimed.
Next, suppose . By the triangle inequality,
[TABLE]
Further, by conjugacy invariance of ,
[TABLE]
Substituting (3) in (2), we get
[TABLE]
showing (b). ∎
The algorithms are based on Lemma 3.1. Elementary bounds , are specified by a map with and . If we have no such bounds, i.e. we are computing , we initially take and as the empty map (however the map is updated to avoid repeating computations).
For , the recursive algorithm shown in Figure 1 gives a bound so that for any normalized, conjugacy-invariant pseudo-length on . We describe this using sets in mathematical language, but this can be readily translated to code using, for instance, list comprehensions.
Furthermore, we can keep track of a labelled rooted tree of inequalities used to compute , and hence bound . We give a schematic condensed example of such a tree in Figure 2. Essentially such a tree (actually a corresponding Algebraic Data Type) was used to generate the human readable proof.
Observe that the function is integer valued, and can hence be computed exactly. On using homogeneity we obtain rational bounds. These were stored as double precision real numbers – we switched to arbitrary precision rational numbers at one stage, but switched back for performance reasons during the recursive computations (as these were involved in a large search). Note however that it is easy (and efficient) to map a proof tree using doubles to one using arbitrary precision rational numbers to ensure that there is no error in rounding off. Indeed, as we discuss in 3.7, we have subsequently implemented the mapping of proofs to exact rational bounds.
3.6. Bounding with a homogeneity pair sequence
We now describe how to modify the above algorithm given a homogeneity pair sequence. This is following the approach of 3.4, but with one minor difference as mentioned in Remark 3.6.1.
Suppose now that we are given a homogeneity pair sequence, i.e., a finite sequence of pairs ), . We initialize to be the empty map, and compute using the main algorithm. We let and update the map by setting .
Next we use the main algorithm to compute , but with the map obtained at the end of the previous computation and after setting . Again let and update the map by setting . We proceed in this fashion to obtain the numbers , , …, , and an updated map .
Finally, for an elements , we define the function as the result of using the algorithm with the map obtained at the end of the above sequence of computations and updates.
Remark 3.6.1*.*
While the above algorithm is very similar to that described in 3.4, it gives in general slightly worse bounds. This is because, for a fixed , if is computed for a word while computing, for example (which happens if is a subword of ), we do not recompute when computing, for example . However we may get a smaller value (i.e., better bound) if we recomputed ) as we have the additional elementary bound (we get an improved bound if is a subword of and ). It is easy to avoid this by setting the map when computing to be just the earlier elementary bounds, i.e. set and . However, this comes at a cost in efficiency due to computations being repeated.
3.7. Choices and results
In generating the proofs, we used the family of words of the form , with this family chosen based on expert knowledge. From these, we constructed a homogeneity sequence, depending on certain choices.
Namely, our homogeneity pair sequence was of the following form (with explicit choices stated, which we have used in a script as mentioned below):
- •
We choose and fix (we take ).
- •
Choose and fix a few values of (chosen with some experimentation), say , , …, (we take with , and ).
- •
We get a homogeneity pair sequence taking each element with each exponent between and , namely
[TABLE]
- •
The homogeneity pair sequence we use is the above sequence followed by the sequence
[TABLE]
With the explicit choices , , , and , we get the bound
[TABLE]
and a corresponding human readable proof.
Furthermore, we can map proofs to arbitrary precision rational bounds to avoid rounding errors. Mapping the above proof gives the bound (with no rounding-off error)
[TABLE]
We have created an executable jar file to replicate generating this proof (as well as mapping to a proof with arbitrary precision rational bounds). This is available online at http://math.iisc.ac.in/~gadgil/PolyProof.html (with instructions on running it), along with sample output (slightly reformatted). On the systems we used (a desktop and a laptop with Core i7 processors) this runs in under 10 seconds.444The full code is in the repository https://github.com/siddhartha-gadgil/Superficial. The script is generated from this source. The script uses the same algorithms we originally used, but with modifications to be more robust in memory usage and to avoid concurrency (as the concurrency we implemented leads to non-determinacy, and occasionally to race conditions). The proof generated by this script is a little longer than the one originally posted ( lines instead of ) but gives a slightly better bound.
Values for and the indices were obtained by experimentation, and the bounds are fairly robust when we vary choices.
Remark 3.7.1*.*
In generating the script we only used our knowledge that was useful (though not crucial) while generating the original proof, and the only other choices we tried were taking , which also gives a bound below , and also taking ’s to be , , and , which only marginally improved the bound.
The choice of the family was based on mathematical considerations related to [1], and this was the only expert guidance. We next discuss whether finding the proof was plausible using general principles in place of expert knowledge.
3.8. Auxiliary choices without expert knowledge?
As we have seen, the only expert guidance was the “natural family of group elements ”. We sketch a series of general considerations (some using basic group theory) that could plausibly have led to the same family.
- •
A natural measure of usefulness of a homogeneity pair is the ratio , as this is an upper bound on the ratio for , i.e., the maximum possible improvement in bounds (a value of means no gain from using homogeneity).
- •
Rather than looking for individual useful elements, we look for families of useful elements, choosing between families by small scale sampling.
- •
We look for natural families in the sense of having a simple description in terms of the group operations. The simplest families in a group are those of the form for fixed and the next simplest are those of the form for fixed . The family we considered is of the second form. (As is conjugate to , families of the form are equivalent to those we considered.)
- •
We first consider simple families, i.e., with simple words for and , while using symmetries of the problem to reduce choices.
Here the symmetries are: transposing the generators and , transposing one or both generators with their inverses, and cyclic permutations of words. Further, one needs to only consider reduced words, i.e., those without a cancelling pair. Up to all these symmetries, there are only words with length , each with lengths and and a single word with length . Even allowing for not all symmetries being exploited (as they do depend on some expert knowledge) and allowing for different choices for the word , the number of families of complexity comparable to the one we considered is modest.
Furthermore, if is a word of length at most which is not equivalent to , then , which in particular implies that for large and . Thus the families with other values of can be ruled out as not useful with limited experimentation.
Thus, if one searches through natural families, with simpler ones considered first and symmetries exploited to avoid duplication, and assesses each family rapidly by measuring improvements in relevant bounds after small scale sampling, one is likely to arrive at the family we considered (or an equivalent one) in a reasonable amount of time.
4. Concluding remarks
If we view applications of the axioms as moves, the computer proof helped in identifying composite moves that could be applied recursively for words in appropriate families. These were abstracted and generalized to give the core lemma. One can hope that in other situations as well computer generated proofs targeting key examples give hints about useful composite moves, and especially those that can be used iteratively.
The principal difficulty in finding computer proofs often lies in choosing the useful moves among those that increase complexity, which in our case are applications of homogeneity. In this work, we primarily based ourselves on mathematical considerations, partly because the auxiliary choices were identified even before we started programming, and partly because of the fast pace555the proof was posted less than two days after we began writing code, and the main question answered less than a day after that.. However, we have attempted to justify in Section 3.8 that it is plausible that experimentation and general considerations could have led to similar results, with the use of one heuristic – one should search for natural families of useful moves.
We used modest computing resources (including time) and did not use search heuristics. It is thus all the more likely that domain specific expertise could have been replaced by (or combined with) the vast arsenal of well-known heuristics for tree searches, such as Alpha-beta pruning, Markov Chain Monte Carlo and various Machine learning techniques.
Fully automating the search in this and similar problems involves identifying useful candidate families (beyond simply enumerating those with simple descriptions). This is an interesting challenge, and it would be interesting to explore various techniques for this. Computer scientists have developed various techniques to find inputs trigger a bug (such as fuzzing) and techniques to minimize such inputs (such as delta-debugging). Perhaps these techniques are also useful for finding a human-understandable proof like the one presented in the paper.
Acknowledgements*.*
I thank the referees and the editors for many valuable comments, which have led to the paper being completely rewritten twice and much improved in the process. It is also a pleasure to thank the rest of the PolyMath 14 team for the collaboration of which the work described here is a part.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. Gadgil, Watson–Crick pairing, the Heisenberg group and Milnor invariants , Journal of Mathematical Biology 59 (2009), 123–142.
- 2[2] A. Khare and B. Rajaratnam, The Hoffmann-Jørgensen inequality in metric semigroups , Annals of Probability, 45 (2017), 4101–4111.
- 3[3] A. Khare and B. Rajaratnam, The Khinchin–Kahane inequality and Banach space embeddings for metric groups , preprint, 2016.
- 4[4] D.H.J. Poly Math, Homogeneous length functions on Groups , Algebra and Number Theory 12 (2018), 1773-1786.
