Homogeneous length functions on Groups: Intertwined computer & human   proofs

Siddhartha Gadgil

arXiv:1904.05214·cs.LO·April 11, 2019

Homogeneous length functions on Groups: Intertwined computer & human proofs

Siddhartha Gadgil

PDF

Open Access

TL;DR

This paper discusses a unique collaboration between human mathematicians and computer-generated proofs, leading to the discovery of a significant mathematical result through an iterative process of understanding and abstraction.

Contribution

It introduces a novel proof methodology combining computer-generated and human-readable proofs to facilitate mathematical discovery.

Findings

01

Computer-assisted proofs can be effectively understood and generalized by humans.

02

The interplay between human insight and computer proofs can lead to new mathematical results.

03

A key lemma was derived through this collaborative proof process.

Abstract

We describe a case of an interplay between human and computer proving which played a role in the discovery of an interesting mathematical result. The unusual feature of the use of computers here was that a computer generated but human readable proof was read, understood, generalized and abstracted by mathematicians to obtain the key lemma in an interesting mathematical result.

Equations39

ξ_{1} ξ_{2} \dots ξ_{m} λ λ^{- 1} ξ_{m + 1} \dots ξ_{n} \sim ξ_{1} ξ_{2} \dots ξ_{m} ξ_{m + 1} \dots ξ_{n}

ξ_{1} ξ_{2} \dots ξ_{m} λ λ^{- 1} ξ_{m + 1} \dots ξ_{n} \sim ξ_{1} ξ_{2} \dots ξ_{m} ξ_{m + 1} \dots ξ_{n}

(ξ_{1} ξ_{2} \dots ξ_{n}) \cdot (l_{1}^{'} l_{2}^{'} \dots l_{m}^{'}) = ξ_{1} ξ_{2} \dots ξ_{n} l_{1}^{'} l_{2}^{'} \dots l_{m}^{'}

(ξ_{1} ξ_{2} \dots ξ_{n}) \cdot (l_{1}^{'} l_{2}^{'} \dots l_{m}^{'}) = ξ_{1} ξ_{2} \dots ξ_{n} l_{1}^{'} l_{2}^{'} \dots l_{m}^{'}

l (x) \leq \frac{l ( y ) + l ( z )}{2} .

l (x) \leq \frac{l ( y ) + l ( z )}{2} .

f (m, k) \leq \frac{f ( m - 1 , k ) + f ( m + 1 , k - 1 )}{2} .

f (m, k) \leq \frac{f ( m - 1 , k ) + f ( m + 1 , k - 1 )}{2} .

l (α β α^{- 1} β^{- 1}) \leq min {\frac{L (( α β α ^{- 1} β ^{- 1} ) ^{n} )}{n}) : 1 \leq n \leq 20} .

l (α β α^{- 1} β^{- 1}) \leq min {\frac{L (( α β α ^{- 1} β ^{- 1} ) ^{n} )}{n}) : 1 \leq n \leq 20} .

max {l (g) : l normalized, homogeneous, conjugacy-invariant pseudo-length} .

max {l (g) : l normalized, homogeneous, conjugacy-invariant pseudo-length} .

l_{c} (g) = max {l (g) : l \in L_{c}} .

l_{c} (g) = max {l (g) : l \in L_{c}} .

l_{b} (g; B) = max {l (g) : l \in L_{c}, l (g_{i}) \leq x_{i} \forall (g_{i}, x_{i}) \in B} .

l_{b} (g; B) = max {l (g) : l \in L_{c}, l (g_{i}) \leq x_{i} \forall (g_{i}, x_{i}) \in B} .

l (g)

l (g)

\leq l (ξ_{1}) + l (ξ_{2} ξ_{3} \dots ξ_{n}),

\leq 1 + l (ξ_{2} ξ_{3} \dots ξ_{n}),

l (g)

l (g)

= l (ξ_{1} (ξ_{2} ξ_{3} \dots ξ_{k - 1}) ξ_{1}^{- 1}) + l (ξ_{k + 1} ξ_{k + 2} \dots ξ_{n}) .

l (ξ_{1} (ξ_{2} ξ_{3} \dots ξ_{k - 1}) ξ_{1}^{- 1}) = l (ξ_{2} ξ_{3} \dots ξ_{k - 1}) .

l (ξ_{1} (ξ_{2} ξ_{3} \dots ξ_{k - 1}) ξ_{1}^{- 1}) = l (ξ_{2} ξ_{3} \dots ξ_{k - 1}) .

l (g) \leq l (ξ_{2} ξ_{3} \dots ξ_{k - 1}) + l (ξ_{k + 1} ξ_{k + 2} \dots ξ_{n}),

l (g) \leq l (ξ_{2} ξ_{3} \dots ξ_{k - 1}) + l (ξ_{k + 1} ξ_{k + 2} \dots ξ_{n}),

λ_{k} = L (ξ_{2} ξ_{3} \dots ξ_{k - 1}) + L (ξ_{k + 1} ξ_{k + 2} \dots ξ_{n}) .

λ_{k} = L (ξ_{2} ξ_{3} \dots ξ_{k - 1}) + L (ξ_{k + 1} ξ_{k + 2} \dots ξ_{n}) .

Λ = {λ_{k} : 2 \leq k \leq n, ξ_{k} = ξ_{1}^{- 1}} .

Λ = {λ_{k} : 2 \leq k \leq n, ξ_{k} = ξ_{1}^{- 1}} .

(γ_{k_{1}}, 1), \dots, (γ_{k_{1}}, N), (γ_{k_{2}}, 1), \dots, (γ_{k_{2}}, N), \dots, (γ_{k_{m}}, 1), \dots, (γ_{k_{m}}, N) .

(γ_{k_{1}}, 1), \dots, (γ_{k_{1}}, N), (γ_{k_{2}}, 1), \dots, (γ_{k_{2}}, N), \dots, (γ_{k_{m}}, 1), \dots, (γ_{k_{m}}, N) .

(α β α^{- 1} β^{- 1}, 1), (α β α^{- 1} β^{- 1}, 2) \dots, (α β α^{- 1} β^{- 1}, N)

(α β α^{- 1} β^{- 1}, 1), (α β α^{- 1} β^{- 1}, 2) \dots, (α β α^{- 1} β^{- 1}, N)

l_{h} (α β α^{- 1} β^{- 1}) \leq 0.8098765432098762

l_{h} (α β α^{- 1} β^{- 1}) \leq 0.8098765432098762

l_{h} (α β α^{- 1} β^{- 1}) \leq 328/405.

l_{h} (α β α^{- 1} β^{- 1}) \leq 328/405.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics and Applications

Full text

\setlistdepth

7

Homogeneous length functions on Groups:

Intertwined computer & human proofs

Siddhartha Gadgil

Department of Mathematics,

Indian Institute of Science,

Bangalore 560012, India

[email protected]

Abstract.

We describe a case of an interplay between human and computer proving which played a role in the discovery of an interesting mathematical result [4]. The unusual feature of the use of computers here was that a computer generated but human readable proof was read, understood, generalized and abstracted by mathematicians to obtain the key lemma in an interesting mathematical result.

Key words and phrases:

type theory; homotopy type theory; geometric group theory

2010 Mathematics Subject Classification:

03B15 (primary), 20F12, 20F65 (secondary)

1. Introduction

Computers have come to play many roles in mathematical proofs. Computer experimentation is commonly used to make conjectures and computer algebra systems are used for sophisticated calculations. Components of proofs of important results have also been provided by computers. Such rigorous computer proofs often generate independently verifiable proof certificates. Proof assistants have been used to formalize proofs, including some very complex ones.

Here we describe a case different from these – where a computer generated but human readable proof was read, understood, generalized and abstracted by mathematicians to obtain the key lemma in a significant mathematical result. So far as we know this is the only such instance so far.

The result we discuss concerned a question about the existence of so called homogeneous length functions, which was asked by Terrence Tao on his blog (Apoorva Khare had asked Tao this question). The question was answered in six days in a collaboration that became PolyMath 14111Participants: T. Fritz, S. Gadgil, A. Khare, P. Nielsen, L. Silberman, T. Tao., and the answer (and stronger results) have been published in [4].

To state the main question, we need some definitions. We emphasise that in general the groups $G$ we consider are not abelian (commutative), i.e., if $x,y\in G$ , we may have $xy\neq yx$ . Thus the notation we use is multiplicative, similar to that for matrix multiplication (except with the identity denoted as $e$ rather than $I$ ). Recall that, for fixed $n\in\mathbb{N}$ , invertible $n\times n$ matrices form a group.

We sometimes denote the product of $x$ and $y$ as $x\cdot y$ instead of $xy$ for readability.

Definition 1.1.

A pseudo-length function on a group $G$ is a function $l:G\to[0,\infty)$ such that

•

$l(e)=0$ , where $e\in G$ is the identity.

•

$l(g^{-1})=l(g)$ for all $g\in G$ (symmetry).

•

$l(gh)\leq l(g)+l(h)$ for all $g,h\in G$ (the triangle inequality).

Definition 1.2 (Conjugacy invariance).

A pseudo-length function $l$ on a group $G$ is said to be conjugacy invariant if $l(ghg^{-1})=l(h)$ for all $g,h\in G$ .

Recall that elements $x,y\in G$ are conjugate if there exists $g\in G$ such that $y=gxg^{-1}$ . Conjugacy invariance is thus the property that conjugate elements have equal lengths. Note that in an abelian group, conjugate elements are equal, so this property is automatically satisfied.

Definition 1.3 (Homogeneity).

A pseudo-length function $l$ on a group $G$ is said to be homogeneous if $l(g^{n})=n\cdot l(g)$ for all $g\in G$ , $n\in\mathbb{Z}$ .

Definition 1.4 (Positivity).

A pseudo-length function $l$ on a group $G$ is said to be a length function if $l(g)>0$ for all $g\in G\setminus\{e\}$ .

If $G=(V,+)$ is the additive group of a vector space $V$ over $\mathbb{R}$ , then a norm on $V$ gives a homogeneous, conjugacy invariant length function. For example on $\mathbb{R}^{2}$ both $l_{1}(x,y)=|x|+|y|$ and $l_{2}(x,y)=\sqrt{x^{2}+y^{2}}$ are homogeneous, conjugacy invariant length functions. To see this, note that the properties of a pseudo-length follow from the definition of norms. As mentioned above, conjugacy invariance is automatic as additive groups of vector spaces are abelian.

It was generalizing norms on Vector Spaces that motivated the main question (we elaborate on this after stating Question 1.5). The question was formulated in terms of free groups as these are the prototypical non-abelian groups.

Recall that the free group $\langle\alpha,\beta\rangle$ on $2$ generators $\alpha$ and $\beta$ is the group whose elements are equivalence classes of words in $S=\{\alpha$ , $\beta$ , $\alpha^{-1}$ , $\beta^{-1}\}$ , where we think of $\alpha^{-1}$ and $\beta^{-1}$ as simply formal symbols (we will see that in $\langle\alpha,\beta\rangle$ their equivalence classes are inverses of the equivalence classes of $\alpha$ and $\beta$ ). Namely, we define an equivalence relation $\sim$ so that two words equivalent if and only if they are related by a sequence of moves given by cancellation of pairs of adjacent letters that are inverses of each other and its inverse move, namely inserting a cancelling pair of letters. For example, in $\langle\alpha,\beta\rangle$ , $\alpha\beta\beta^{-1}\alpha\beta\alpha^{-1}=\alpha\alpha\beta\alpha^{-1}$ as cancelling the second and third letters of $\alpha\beta\beta^{-1}\alpha\beta\alpha^{-1}$ gives $\alpha\alpha\beta\alpha^{-1}$ . Conversely, inserting $\beta\beta^{-1}$ between the first and second letters of $\alpha\alpha\beta\alpha^{-1}$ gives $\alpha\beta\beta^{-1}\alpha\beta\alpha^{-1}$ .

Formally, we consider the equivalence relation $\sim$ on words in $S$ generated by

[TABLE]

where $\lambda\in S$ , $\xi_{i}\in S\ \forall i,1\leq i\leq n$ and $0\leq m\leq n$ . The case $m=0$ corresponds to prepending a cancelling pair, and $m=n$ to appending a cancelling pair. The case $n=0$ (which forces $m=0$ ) corresponds to the empty word. The elements of $\langle\alpha,\beta\rangle$ are equivalence classes under this equivalence relation.

Multiplication in $\langle\alpha,\beta\rangle$ is given by concatenation, i.e.

[TABLE]

where $\cdot$ denotes the group multiplication. More formally, concatenation of words induces a well-defined multiplication on equivalence classes under $\sim$ of words. The identity is the empty word $e$ (more formally the equivalence class of $e$ ), and the inverse of an element is obtained by inverting letters and reversing the order, i.e., $(\xi_{1}\xi_{2}\dots\xi_{n})^{-1}=\xi_{n}^{-1}\dots\xi_{2}^{-1}\xi_{1}^{-1}$ .

We can now state the main question that was studied.

Question 1.5.

Is there a homogeneous, conjugacy-invariant length function $l$ on the free group $\langle\alpha,\beta\rangle$ on $2$ generators?

Khare asked this question motivated by wanting to generalize results of Khare-Rajaratnam [2][3] from vector spaces with norms to a more general context where commutativity was no longer assumed. However, it was not clear whether any (additional) examples would satisfy this more general hypothesis. The free group was taken as a prototypical group which is not abelian. In fact the results of [4] show that, in a strong sense, the only groups having homogeneous, conjugacy invariant length functions are abelian groups, and all such functions are restrictions of norms to subgroups of vector spaces.

The question is also natural from the point of view of Geometric group theory, where length functions are a central concept and conjugacy invariance of lengths (which corresponds to bi-invariance of metrics) is also commonly studied. Lengths satisfying the additional condition of homogeneity were not much studied previously – which we now know is because there are no interesting examples (except restrictions of norms on vector spaces, which are well understood).

2. Homogeneous length functions and the Internal repetition trick

We now describe the history and some ingredients of the solution Question 1.5.

It is natural to view Question 1.5 as asking whether a homogeneous, conjugacy-invariant pseudo-length function $l$ on $\langle\alpha,\beta\rangle$ can also be positive, hence a length function. Further, we can normalize to assume that $l(s)\leq 1$ for $s=\alpha,\beta$ (hence, by symmetry, $l(s)\leq 1$ for $s=\alpha^{-1},\beta^{-1}$ ). We shall say $l$ is normalized if $l(s)\leq 1$ for $s=\alpha,\beta,\alpha^{-1},\beta^{-1}$ .

After the failure of various constructions (by day $3$ ), the following conjecture seemed likely.

Conjecture 2.1.

For any homogeneous, conjugacy-invariant pseudo-length function $l$ on $\langle\alpha,\beta\rangle$ , we have $l(\alpha\beta\alpha^{-1}\beta^{-1})=0$ .

In particular, this conjecture implies that $l$ cannot be a length function. Note that it is natural to focus on the element $\alpha\beta\alpha^{-1}\beta^{-1}$ as a group $G$ is abelian if and only if $xyx^{-1}y^{-1}=1\ \forall x,y\in G$ , and we were trying to understand whether there are non-abelian examples of groups with length functions with the desired properties.

Several bounds on $l(\alpha\beta\alpha^{-1}\beta^{-1})$ were obtained from the hypothesis, giving bounds that even went below $1$ . However, these methods appeared to stagnate with the best bound obtained a little above $0.9$ .

Using computer-assistance, we obtained and posted a human readable proof showing $l(\alpha\beta\alpha^{-1}\beta^{-1})\leq 0.82$ . An extract of this proof is below222https://github.com/siddhartha-gadgil/Superficial/wiki/A-commutator-bound for the full proof as originally posted.. Note that we have used somewhat different notation here – the generators are $a$ and $b$ and their inverses are denoted $\bar{a}$ and $\bar{b}$ . We remark that a fully expanded proof actually had over 2000 lines, but avoiding duplication gave the posted $126$ lines.

•

$|\bar{a}|\leq 1.0$

•

$|\bar{b}\bar{a}b|\leq 1.0$ using $|\bar{a}|\leq 1.0$

•

$|\bar{b}|\leq 1.0$

•

$|a\bar{b}\bar{a}|\leq 1.0$ using $|\bar{b}|\leq 1.0$

•

$|\bar{a}\bar{b}ab\bar{a}\bar{b}|\leq 2.0$ using $|\bar{a}\bar{b}a|\leq 1.0$ and $|b\bar{a}\bar{b}|\leq 1.0$

•

… (119 lines)

•

$|ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}|\leq 13.859649122807017$

using $|ab\bar{a}|\leq 1.0$ and

$|\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}|\leq 12.859649122807017$

•

$|ab\bar{a}\bar{b}|\leq 0.8152734778121775$ using

$|ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}ab\bar{a}\bar{b}|\leq 13.859649122807017$ by

taking 17th power.

This proof was studied, understood and generalized by Pace Nielsen, who called the method the internal repetition trick. After several improvements due to Nielsen and Tobias Fritz, this was abstracted by Terrence Tao as the following lemma. Note that this holds for all conjugacy-invariant, homogeneous pseudo-lengths $l$ on all groups $G$ .

Lemma 2.2.

Let $x$ , $y$ , $z$ , $w$ in $G$ be such that $x$ is conjugate to both $wy$ and $zw^{-1}$ , i.e., there exist elements $s,t\in G$ such that $x=swys^{-1}=tzw^{-1}t^{-1}$ . Then one has

[TABLE]

Fritz used this to obtain the key lemma.

Lemma 2.3.

Let $f(m,k)=l(x^{m}(xyx^{-1}y^{-1})^{k})$ . Then

[TABLE]

We apply this lemma to $x=\alpha$ , $y=\beta$ . An argument based on probability theory, due to Tao, showed that $l(\alpha\beta\alpha^{-1}\beta^{-1})=0$ . This in particular answered Question 1.5 (the main result proved in [4] is actually stronger than the Theorem 2.4).

Theorem 2.4 (see [4]).

For every homogeneous, conjugacy-invariant pseudo-length function $l:\langle\alpha,\beta\rangle\to[0,\infty)$ , we have $l(\alpha\beta\alpha^{-1}\beta^{-1})=0$ . In particular $l$ is not a length function.

We mention some of the ingredients in the proof of Theorem 2.4 using Lemma 2.3 with $x=\alpha$ and $y=\beta$ . Consider a random walk on points $(m,k)\in\mathbb{Z}^{2}$ where we move to $(m-1,k)$ (i.e., one step to the left) with probability $1/2$ and to $(m+1,k-1)$ (i.e, diagonally down and right) with probability $1/2$ . Then Lemma 2.3 says that $f(m,n)$ is at most the average value of $f$ after one step, and hence inductively after $n$ steps for $n\in\mathbb{N}$ . Also observe that we move on an average $1/2$ a step downwards (the left and right movements cancel on an average). Hence if we start at a point $(0,n)$ , (where $n\in\mathbb{N}$ ) the distribution after $2n$ steps is centered around the origin, and $f(0,n)$ is bounded by the average value of $f$ on this distribution. This average in turn can be bounded using the Chebyshev inequality (as in the proof of the law of large numbers) together with the observation that $f(k,l)\leq m+2k$ if $l$ is normalized (the latter follows by a straightforward inductive use of the triangle inequality and conjugacy invariance). The bound thus obtained is of the form $f(0,n)\leq C\sqrt{n}$ for some constant $C\in\mathbb{R}$ . Finally, as homogeneity gives $l(\alpha\beta\alpha^{-1}\beta^{-1})\leq f(0,n)/n$ , taking a limit as $n\to\infty$ gives $l(\alpha\beta\alpha^{-1}\beta^{-1})=0$ .

3. The Algorithms

Our proof was generated by a mixture of algorithms and expert guidance (with some arbitrary choices). More precisely, given certain auxiliary choices, a deterministic algorithm gave upper bounds $L(g)$ such that $l(g)\leq L(g)$ for all normalized, homogeneous, conjugacy-invariant pseudo-length functions $l:\langle\alpha,\beta\rangle\to\mathbb{R}$ and for all $g\in\langle\alpha,\beta\rangle$ .

The auxiliary choices were a finite sequence of pairs $(g_{i},n_{i})$ , with $g_{i}\in\langle\alpha,\beta\rangle$ and $n_{i}\in\mathbb{N}$ . We used homogeneity only for these pairs. Thus, our algorithm actually gives an upper bound for all functions $l:\langle\alpha,\beta\rangle\to\mathbb{R}$ such that

•

$l$ is a normalized, conjugacy-invariant length function on $\langle\alpha,\beta\rangle$ .

•

$l(g_{i})\leq l(g_{i}^{n_{i}})/n_{i}$ for all pairs $(g_{i},n_{i})$ .

We shall call such pairs $(g_{i},n_{i})$ homogeneity pairs and a sequence of homogeneity pairs a homogeneity pair sequence. Explicit choices for such pairs that give a proof similar to the posted one are given in 3.7, along with links to a script to replicate this (which runs in under 10 seconds on a moderately powerful laptop/desktop). We discuss in 3.8 how plausible it is to have arrived at such choices through general principles, without expert guidance.

We used a deterministic algorithm (depending on a homogeneity pair sequence) to obtain upper bounds $L(g)$ with $l(g)\leq L(g)$ for all lengths as above and for all $g\in\langle\alpha,\beta\rangle$ . Using this, we computed the bound

[TABLE]

This (after keeping track of inequalities and rendering in human readable form) was the posted proof.

All pseudo-lengths we consider henceforth will be assumed to be normalized and conjugacy-invariant (but not necessarily homogeneous).

3.1. Maximal homogeneous pseudo-lengths

It is convenient to reformulate our main problem using a standard construction. Namely, we define a function $l_{h}:\langle\alpha,\beta\rangle\to\mathbb{R}$ by defining, for $g\in\langle\alpha,\beta\rangle$ , $l_{h}(g)$ to be

[TABLE]

It is well known that this is well-defined and gives the maximal normalized, homogeneous, conjugacy-invariant pseudo-length on $\langle\alpha,\beta\rangle$ . Thus, our main problem is equivalent to finding upper bounds for $l_{h}(g)$ , in particular for $l_{h}(\alpha\beta\alpha^{-1}\beta^{-1})$ .

3.2. Bounding conjugacy-invariant pseudo-lengths

We now make analogous constructions dropping the homogeneity condition. Let $\mathcal{L}_{c}$ be the set of all normalized, conjugacy-invariant pseudo-length functions on $\langle\alpha,\beta\rangle$ . We define, for $g\in\langle\alpha,\beta\rangle$ ,

[TABLE]

This is well-defined and gives the maximal normalized, conjugacy-invariant pseudo-length on $\langle\alpha,\beta\rangle$ (i.e., the maximal element of $\mathcal{L}_{c}$ ). Further, clearly $l_{h}(g)\leq l_{c}(g)$ , so upper bounds on $l_{c}(g)$ give ones on $l_{h}(g)$ .

We describe in Section 3.5 an algorithm to obtain an upper bound $L_{c}(g)$ for $l_{c}(g)$ . Indeed this bound is sharp, i.e., we have $l_{c}(g)=L_{c}(g)$ for all $g\in\langle\alpha,\beta\rangle$ (we do not prove or use this, but this fact motivated our approach). Here and henceforth we follow the convention that we use $l$ with subscripts to denote pseudo-lengths we wish to bound (whose definition may be non-constructive) and $L$ with the same subscript to denote algorithmic upper bounds for these lengths.

3.3. Conjugacy-invariant lengths with elementary bounds

Next, suppose we are given a finite set $B$ of pairs $(g_{i},x_{i})$ , $1\leq i\leq m$ , with $g\in\langle\alpha,\beta\rangle$ and $x_{i}\geq 0$ , $x_{i}\in\mathbb{R}$ (we call this a set of elementary bounds). We consider a refinement of $l_{c}$ and a corresponding modified algorithm (our definitions and algorithms do not depend on the order of the pairs $(g_{i},x_{i})$ ).

Namely, for $g\in\langle\alpha,\beta\rangle$ , we define

[TABLE]

The function $l_{b}(g;B)$ is a normalized, conjugacy-invariant pseudo-length on $\langle\alpha,\beta\rangle$ which is maximal among such lengths that satisfy the additional bounds $l(g_{i})\leq x_{i}$ for all $(g_{i},x_{i})\in B$ .

Note that $l_{c}(g)=l_{b}(g;\emptyset)$ for $g\in\langle\alpha,\beta\rangle$ . A straightforward modification of the algorithm describing $L_{c}(g)=l_{c}(g)$ gives an algorithm giving bounds $L_{b}(g;B)$ such that $l_{b}(g;B)\leq L_{b}(g;B)$ for all $g\in\langle\alpha,\beta\rangle$ . We remark that the bound given by this algorithm is not optimal.333Indeed, an optimal algorithm for $l_{b}(g;B)$ for general finite $B$ gives a solution to the word problem for groups, which is known to be algorithmically undecidable. Namely, given relations $r_{1}\in\langle\alpha,\beta\rangle$ , $r_{2}\in\langle\alpha,\beta\rangle$ , … $r_{m}\in\langle\alpha,\beta\rangle$ , let $B$ be the set $\{(r_{1},0),(r_{2},0),\dots,r_{n},0)\}$ . Then $l_{b}(g;B)=0$ if and only if $g$ is trivial in the group $\langle\alpha,\beta;r_{1}=e$ , $r_{2}=e,\dots r_{m}=e\rangle$ .

We shall say that the set of elementary bounds is admissible if $l_{h}(g_{i})\leq x_{i}$ for all $i$ , $1\leq i\leq m$ . Observe that we can algorithmically obtain an admissible set of elementary bounds from a homogeneity pair sequence $(g_{i},n_{i})$ , $1\leq i\leq m$ , by setting $x_{i}=\frac{l_{c}(g_{i}^{n_{i}})}{n_{i}}=\frac{L_{c}(g_{i}^{n_{i}})}{n_{i}}$ as $l_{h}(x_{i})=\frac{l_{h}(g_{i}^{n_{i}})}{n_{i}}\leq\frac{l_{c}(g_{i}^{n_{i}})}{n_{i}}=x_{i}$ .

Note that if a set of elementary bounds $B$ is admissible, then $l_{h}(g)\leq l(g;B)$ for all $g\in\langle\alpha,\beta\rangle$ . Hence $L_{b}(g;B)$ gives an upper bound for $l_{h}$ . We use such a bound, but with the process of obtaining elementary bounds from a homogeneity pair sequences a refinement of setting $x_{i}=\frac{l_{c}(g_{i}^{n_{i}})}{n_{i}}$ (and depending on the order of the pairs).

3.4. Bounds with homogeneity pair sequences

In this section we describe algorithms depending on a homogeneity pair sequence in terms of algorithms depending on elementary bounds, essentially by deducing elementary bounds using homogeneity. The algorithms depending on elementary bounds are described in 3.5, which the reader may prefer to read first. In 3.6 we describe how to modify the algorithms of 3.5 along the lines described below.

Assume that we are given a homogeneity pair sequence, i.e., a finite sequence of pairs $(g_{i},n_{i})$ , $1\leq i\leq m$ . We define inductively in $j$ (simultaneously)

•

an elementary bound $(g_{j},x_{j})$ (with the element $g_{j}$ from the given homogeneity pair sequences),

•

a length function $l_{j}:\langle\alpha,\beta\rangle\to\mathbb{R}$ , such that $l_{h}(g)\leq l_{j}(g)$ for all $g\in\langle\alpha,\beta\rangle$ , and

•

An algorithmically defined length function $L_{j}:\langle\alpha,\beta\rangle\to\mathbb{R}$ , $0\leq i\leq m$ , such that $l_{j}(g)\leq L_{j}(g)$ for all $g\in\langle\alpha,\beta\rangle$ .

First, let $l_{0}(g)=l_{c}(g)$ . Let $L_{0}(g)=L_{c}(g)$ , which we recall can be computed algorithmically (as described in 3.5).

Next, let $x_{1}=\frac{l_{0}(g_{1}^{n_{1}})}{n_{1}}=\frac{L_{c}(g_{1}^{n_{1}})}{n_{1}}$ and define $l_{1}(g)=l_{b}(g;\{(g_{1},x_{1})\})$ . By homogeneity, $l_{h}(g_{1})\leq x_{1}$ , so by maximality of $l_{b}(g;\{(g_{1},x_{1})\})$ , $l_{h}(g)\leq l_{1}(g)$ for all $g\in\langle\alpha,\beta\rangle$ .

Recall that we have an algorithm (described in 3.5) giving (for $g\in\langle\alpha,\beta\rangle$ ) an upper bound $L_{b}(g;\{(g_{1},x_{1})\})$ for $l_{b}(g;\{(g_{1},x_{1})\})$ . Define $L_{1}(g)=L_{b}(g;\{(g_{1},x_{1})\})$ for $g\in\langle\alpha,\beta\rangle$ .

Continuing in this fashion define

•

$x_{2}=\frac{L_{1}(g_{2}^{n})}{n_{2}}$ (which can be algorithmically computed),

•

$l_{2}(g)=l_{b}(g;\{(g_{1},x_{1}),(g_{2},x_{2})\})$ , and

•

$L_{2}=L_{b}(g;\{(g_{1},x_{1}),(g_{2},x_{2})\})$ (which is algorithmic).

As before, we have the bounds $l_{h}(g)\leq l_{2}(g)$ for all $g\in\langle\alpha,\beta\rangle$ .

Inductively, given $k<m$ , $x_{i}\in\langle\alpha,\beta\rangle$ for $1\leq i\leq k$ , a function $l_{k}:\langle\alpha,\beta\rangle\to\mathbb{R}$ , and an algorithmically defined function $L_{k}:\langle\alpha,\beta\rangle\to\mathbb{R}$ , define

•

$x_{k+1}=\frac{L_{k}(g_{k+1}^{n_{k+1}})}{n_{k+1}}$ (which can be algorithmically computed),

•

$l_{k+1}(g):=l_{b}(g;\{(g_{1},x_{1}),(g_{2},x_{2}),\dots,(g_{k+1},n_{k+1})\})$ , and

•

$L_{k+1}(g):=L_{b}(g;\{(g_{1},x_{1}),(g_{2},x_{2}),\dots,(g_{k+1},n_{k+1})\})$

The function $L(g):=L_{m}(g)=L_{b}(g;\{(g_{1},x_{1}),(g_{2},x_{2}),\dots,(g_{n},n_{m})\})$ is the desired upper bound for $l_{h}$ .

3.5. Algorithm for conjugacy-invariant pseudo-lengths

We now describe the algorithms giving $L_{c}(g)$ and $L_{b}(g;B)$ , i.e. giving upper bounds for $l_{c}(g)$ and $l_{b}(g;B)$ . Recall that $l_{c}(g)=l_{b}(g;\emptyset)$ .

Let $l$ be a normalized, conjugacy-invariant pseudo-length $l(g)$ , which may also be assumed to satisfy a finite number of elementary bounds. We describe an upper bound for $l(g)$ for a word $g=\xi_{1}\xi_{2}\dots\xi_{n}$ recursively in the length $n$ of the word. The key ingredient is the following lemma bounding $l(g)$ in terms of bounds on shorter words.

Lemma 3.1.

Let $g=\xi_{1}\xi_{2}\dots\xi_{n}$ with $n>1$ .

(a)

$l(g)\leq 1+l(\xi_{2}\xi_{3}\dots\xi_{n})$ ** 2. (b)

If $\xi_{k}=\xi_{1}^{-1}$ , then $l(g)\leq l(\xi_{2}\xi_{3}\dots\xi_{k-1})+l(\xi_{k+1}\xi_{k+2}\dots\xi_{n})$

Proof.

To see (a), observe that

[TABLE]

as claimed.

Next, suppose $\xi_{k}=\xi_{1}^{-1}$ . By the triangle inequality,

[TABLE]

Further, by conjugacy invariance of $l$ ,

[TABLE]

Substituting (3) in (2), we get

[TABLE]

showing (b). ∎

The algorithms are based on Lemma 3.1. Elementary bounds $l(g_{i})\leq x_{i}$ , $1\leq i\leq m$ are specified by a map $L_{0}:D\to\mathbb{R}$ with $D\subset\langle\alpha,\beta\rangle=\{g_{1},\dots,g_{m}\}$ and $L_{0}(g_{i})=x_{i}$ . If we have no such bounds, i.e. we are computing $l_{c}$ , we initially take $D=\emptyset$ and $L_{0}$ as the empty map (however the map is updated to avoid repeating computations).

For $g\in\langle\alpha,\beta\rangle$ , the recursive algorithm shown in Figure 1 gives a bound $L(g)$ so that $l(g)\leq L(g)$ for any normalized, conjugacy-invariant pseudo-length $l$ on $\langle\alpha,\beta\rangle$ . We describe this using sets in mathematical language, but this can be readily translated to code using, for instance, list comprehensions.

Furthermore, we can keep track of a labelled rooted tree of inequalities used to compute $L(g)$ , and hence bound $l(g)$ . We give a schematic condensed example of such a tree in Figure 2. Essentially such a tree (actually a corresponding Algebraic Data Type) was used to generate the human readable proof.

Observe that the function $L$ is integer valued, and can hence be computed exactly. On using homogeneity we obtain rational bounds. These were stored as double precision real numbers – we switched to arbitrary precision rational numbers at one stage, but switched back for performance reasons during the recursive computations (as these were involved in a large search). Note however that it is easy (and efficient) to map a proof tree using doubles to one using arbitrary precision rational numbers to ensure that there is no error in rounding off. Indeed, as we discuss in 3.7, we have subsequently implemented the mapping of proofs to exact rational bounds.

3.6. Bounding with a homogeneity pair sequence

We now describe how to modify the above algorithm given a homogeneity pair sequence. This is following the approach of 3.4, but with one minor difference as mentioned in Remark 3.6.1.

Suppose now that we are given a homogeneity pair sequence, i.e., a finite sequence of pairs $(g_{i},n_{i}$ ), $1\leq i\leq m$ . We initialize $L_{0}$ to be the empty map, and compute $L(g_{1}^{n_{1}})$ using the main algorithm. We let $x_{1}=L(g_{1}^{n_{1}})/n_{1}$ and update the map $L_{0}$ by setting $L_{0}(g_{1})=x_{1}$ .

Next we use the main algorithm to compute $L(g_{2}^{n_{2}})$ , but with $L_{0}$ the map obtained at the end of the previous computation and after setting $L_{0}(g_{1})=x_{1}$ . Again let $x_{2}=L(g_{2}^{n_{2}})/n_{2}$ and update the map $L_{0}$ by setting $L_{0}(g_{2})=x_{2}$ . We proceed in this fashion to obtain the numbers $x_{1}$ , $x_{2}$ , …, $x_{n}$ , and an updated map $L_{0}$ .

Finally, for an elements $g\in\langle\alpha,\beta\rangle$ , we define the function $L(g)$ as the result of using the algorithm with the map $L_{0}$ obtained at the end of the above sequence of computations and updates.

*Remark 3.6.1**.*

While the above algorithm is very similar to that described in 3.4, it gives in general slightly worse bounds. This is because, for a fixed $g_{0}\in\langle\alpha,\beta\rangle$ , if $L(g_{0})$ is computed for a word while computing, for example $x_{1}$ (which happens if $g_{0}$ is a subword of $g_{1}^{n_{1}}$ ), we do not recompute $L(g_{0})$ when computing, for example $x_{2}$ . However we may get a smaller value (i.e., better bound) if we recomputed $L(g_{0}$ ) as we have the additional elementary bound $L(x_{1})\leq g_{1}$ (we get an improved bound if $g_{1}$ is a subword of $g_{0}$ and $L(g_{1}^{n_{1}})<n_{1}L(g_{1})$ ). It is easy to avoid this by setting the map $L_{0}$ when computing $x_{i}$ to be just the earlier elementary bounds, i.e. set $D=\{g_{j}:j<i\}$ and $L_{0}(g_{j})=x_{j}$ . However, this comes at a cost in efficiency due to computations being repeated.

3.7. Choices and results

In generating the proofs, we used the family of words of the form $\gamma_{k}=\alpha(\alpha\beta\alpha^{-1}\beta^{-1})^{k}$ , with this family chosen based on expert knowledge. From these, we constructed a homogeneity sequence, depending on certain choices.

Namely, our homogeneity pair sequence was of the following form (with explicit choices stated, which we have used in a script as mentioned below):

•

We choose and fix $N\geq 1$ (we take $N=20$ ).

•

Choose and fix a few values of $k$ (chosen with some experimentation), say $k_{1}$ , $k_{2}$ , …, $k_{m}$ (we take $m=3$ with $k_{1}=1$ , $k_{2}=2$ and $k_{3}=6$ ).

•

We get a homogeneity pair sequence taking each element $\gamma_{k_{i}}$ with each exponent between $1$ and $N$ , namely

[TABLE]

•

The homogeneity pair sequence we use is the above sequence followed by the sequence

[TABLE]

With the explicit choices $N=20$ , $m=3$ , $k_{1}=1$ , $k_{2}=2$ and $k_{3}=6$ , we get the bound

[TABLE]

and a corresponding human readable proof.

Furthermore, we can map proofs to arbitrary precision rational bounds to avoid rounding errors. Mapping the above proof gives the bound (with no rounding-off error)

[TABLE]

We have created an executable jar file to replicate generating this proof (as well as mapping to a proof with arbitrary precision rational bounds). This is available online at http://math.iisc.ac.in/~gadgil/PolyProof.html (with instructions on running it), along with sample output (slightly reformatted). On the systems we used (a desktop and a laptop with Core i7 processors) this runs in under 10 seconds.444The full code is in the repository https://github.com/siddhartha-gadgil/Superficial. The script is generated from this source. The script uses the same algorithms we originally used, but with modifications to be more robust in memory usage and to avoid concurrency (as the concurrency we implemented leads to non-determinacy, and occasionally to race conditions). The proof generated by this script is a little longer than the one originally posted ( $173$ lines instead of $126$ ) but gives a slightly better bound.

Values for $N$ and the indices $k_{i}$ were obtained by experimentation, and the bounds are fairly robust when we vary choices.

*Remark 3.7.1**.*

In generating the script we only used our knowledge that $k=6$ was useful (though not crucial) while generating the original proof, and the only other choices we tried were taking $N=10$ , which also gives a bound below $1$ , and also taking $k$ ’s to be $1$ , $2$ , $3$ and $6$ , which only marginally improved the bound.

The choice of the family $\gamma_{k}$ was based on mathematical considerations related to [1], and this was the only expert guidance. We next discuss whether finding the proof was plausible using general principles in place of expert knowledge.

3.8. Auxiliary choices without expert knowledge?

As we have seen, the only expert guidance was the “natural family of group elements $\gamma_{k}$ ”. We sketch a series of general considerations (some using basic group theory) that could plausibly have led to the same family.

•

A natural measure of usefulness of a homogeneity pair $(g,n)$ is the ratio $\rho(g,n)=\frac{l_{c}(g)}{l_{c}(g^{n})/n}$ , as this is an upper bound on the ratio $l_{c}(h)/l_{b}(h;\{(g,n)\})$ for $h\in\langle\alpha,\beta\rangle$ , i.e., the maximum possible improvement in bounds (a value of $1$ means no gain from using homogeneity).

•

Rather than looking for individual useful elements, we look for families $\gamma_{k}$ of useful elements, choosing between families by small scale sampling.

•

We look for natural families in the sense of having a simple description in terms of the group operations. The simplest families in a group are those of the form $a^{k}$ for fixed $a\in\langle\alpha,\beta\rangle$ and the next simplest are those of the form $ab^{k}$ for fixed $a,b\in\langle\alpha,\beta\rangle$ . The family we considered is of the second form. (As $b^{k}a$ is conjugate to $a^{-1}b^{k}$ , families of the form $b^{k}a$ are equivalent to those we considered.)

•

We first consider simple families, i.e., with simple words for $a$ and $b$ , while using symmetries of the problem to reduce choices.

Here the symmetries are: transposing the generators $\alpha$ and $\beta$ , transposing one or both generators with their inverses, and cyclic permutations of words. Further, one needs to only consider reduced words, i.e., those without a cancelling pair. Up to all these symmetries, there are only $6$ words with length $4$ , $2$ each with lengths $3$ and $2$ and a single word with length $1$ . Even allowing for not all symmetries being exploited (as they do depend on some expert knowledge) and allowing for different choices for the word $a$ , the number of families of complexity comparable to the one we considered is modest.

Furthermore, if $b$ is a word of length at most $4$ which is not equivalent to $\alpha\beta\alpha^{-1}\beta^{-1}$ , then $l_{h}(b)=l_{c}(b)$ , which in particular implies that $\rho(ab^{k},n)\approx 1$ for large $k$ and $n$ . Thus the families with other values of $b$ can be ruled out as not useful with limited experimentation.

Thus, if one searches through natural families, with simpler ones considered first and symmetries exploited to avoid duplication, and assesses each family rapidly by measuring improvements in relevant bounds after small scale sampling, one is likely to arrive at the family we considered (or an equivalent one) in a reasonable amount of time.

4. Concluding remarks

If we view applications of the axioms as moves, the computer proof helped in identifying composite moves that could be applied recursively for words in appropriate families. These were abstracted and generalized to give the core lemma. One can hope that in other situations as well computer generated proofs targeting key examples give hints about useful composite moves, and especially those that can be used iteratively.

The principal difficulty in finding computer proofs often lies in choosing the useful moves among those that increase complexity, which in our case are applications of homogeneity. In this work, we primarily based ourselves on mathematical considerations, partly because the auxiliary choices were identified even before we started programming, and partly because of the fast pace555the proof was posted less than two days after we began writing code, and the main question answered less than a day after that.. However, we have attempted to justify in Section 3.8 that it is plausible that experimentation and general considerations could have led to similar results, with the use of one heuristic – one should search for natural families of useful moves.

We used modest computing resources (including time) and did not use search heuristics. It is thus all the more likely that domain specific expertise could have been replaced by (or combined with) the vast arsenal of well-known heuristics for tree searches, such as Alpha-beta pruning, Markov Chain Monte Carlo and various Machine learning techniques.

Fully automating the search in this and similar problems involves identifying useful candidate families (beyond simply enumerating those with simple descriptions). This is an interesting challenge, and it would be interesting to explore various techniques for this. Computer scientists have developed various techniques to find inputs trigger a bug (such as fuzzing) and techniques to minimize such inputs (such as delta-debugging). Perhaps these techniques are also useful for finding a human-understandable proof like the one presented in the paper.

*Acknowledgements**.*

I thank the referees and the editors for many valuable comments, which have led to the paper being completely rewritten twice and much improved in the process. It is also a pleasure to thank the rest of the PolyMath 14 team for the collaboration of which the work described here is a part.

Bibliography4

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Gadgil, Watson–Crick pairing, the Heisenberg group and Milnor invariants , Journal of Mathematical Biology 59 (2009), 123–142.
2[2] A. Khare and B. Rajaratnam, The Hoffmann-Jørgensen inequality in metric semigroups , Annals of Probability, 45 (2017), 4101–4111.
3[3] A. Khare and B. Rajaratnam, The Khinchin–Kahane inequality and Banach space embeddings for metric groups , preprint, 2016.
4[4] D.H.J. Poly Math, Homogeneous length functions on Groups , Algebra and Number Theory 12 (2018), 1773-1786.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Homogeneous length functions on Groups:

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

Definition 1.1**.**

Definition 1.2** (Conjugacy invariance).**

Definition 1.3** (Homogeneity).**

Definition 1.4** (Positivity).**

Question 1.5**.**

2. Homogeneous length functions and the Internal repetition trick

Conjecture 2.1**.**

Lemma 2.2**.**

Lemma 2.3**.**

Theorem 2.4** (see [4]).**

3. The Algorithms

3.1. Maximal homogeneous pseudo-lengths

3.2. Bounding conjugacy-invariant pseudo-lengths

3.3. Conjugacy-invariant lengths with elementary bounds

3.4. Bounds with homogeneity pair sequences

3.5. Algorithm for conjugacy-invariant pseudo-lengths

Lemma 3.1**.**

Proof.

3.6. Bounding with a homogeneity pair sequence

Remark 3.6.1*.*

3.7. Choices and results

Remark 3.7.1*.*

3.8. Auxiliary choices without expert knowledge?

4. Concluding remarks

Acknowledgements*.*

Definition 1.1.

Definition 1.2 (Conjugacy invariance).

Definition 1.3 (Homogeneity).

Definition 1.4 (Positivity).

Question 1.5.

Conjecture 2.1.

Lemma 2.2.

Lemma 2.3.

Theorem 2.4 (see [4]).

Lemma 3.1.

*Remark 3.6.1**.*

*Remark 3.7.1**.*

*Acknowledgements**.*