Solutions to twisted word equations and equations in virtually free   groups

Volker Diekert; Murray Elder

arXiv:1701.03297·math.GR·March 1, 2022

Solutions to twisted word equations and equations in virtually free groups

Volker Diekert, Murray Elder

PDF

TL;DR

This paper proves that the set of solutions to twisted word equations in virtually free groups can be described as an EDT0L language and computed within PSPACE, extending previous work with concrete complexity bounds.

Contribution

It establishes that all solutions form an EDT0L language and provides PSPACE algorithms for solving and analyzing these equations, improving upon prior results with explicit complexity bounds.

Findings

01

Solutions form an EDT0L language

02

Solution set decision problems are in PSPACE

03

Algorithms can be implemented in NSPACE(n^2 log n) for certain cases

Abstract

It is well known that the problem solving equations in virtually free groups can be reduced to the problem of solving twisted word equations with regular constraints over free monoids with involution. In this paper we prove that the set of all solutions of a twisted word equation is an EDT0L language whose specification can be computed in $PSPACE$ . Within the same complexity bound we can decide whether the solution set is empty, finite, or infinite. In the second part of the paper we apply the results for twisted equations to obtain in $PSPACE$ an EDT0L description of the solution set of equations with rational constraints for finitely generated virtually free groups in standard normal forms with respect to a natural set of generators. If the rational constraints are given by a homomorphism into a fixed (or "small enough") finite monoid, then our algorithms can be…

Equations423

g (w) = g (a_{1}) \dots g (a_{m}) .

g (w) = g (a_{1}) \dots g (a_{m}) .

w [i, j] = a_{i} \dots a_{j} .

w [i, j] = a_{i} \dots a_{j} .

(μ_{A} (a))_{s, t} = 1 ⟺ a \in L (Q, Σ, δ, {s}, {t}) .

(μ_{A} (a))_{s, t} = 1 ⟺ a \in L (Q, Σ, δ, {s}, {t}) .

w \in L (A) ⟺ μ_{A} (w) \in μ_{A} (L (A)) ⟺ \exists s \in I \exists t \in F : (μ_{A} (w))_{s, t} = 1) .

w \in L (A) ⟺ μ_{A} (w) \in μ_{A} (L (A)) ⟺ \exists s \in I \exists t \in F : (μ_{A} (w))_{s, t} = 1) .

\overline{(x_{1}, y_{1}^{op}) \cdot (x_{2}, y_{2}^{op})} = (y_{2} y_{1}, (x_{1} x_{2})^{op}) = \overline{(x_{2}, y_{2}^{op})} \cdot \overline{(x_{1}, y_{1}^{op})} .

\overline{(x_{1}, y_{1}^{op}) \cdot (x_{2}, y_{2}^{op})} = (y_{2} y_{1}, (x_{1} x_{2})^{op}) = \overline{(x_{2}, y_{2}^{op})} \cdot \overline{(x_{1}, y_{1}^{op})} .

B^{n \times n} \times (B^{n \times n})^{op} = {(P 0 0 Q^{T}) (P 0 0 Q^{T}) P, Q \in B^{n \times n} P, Q \in B^{n \times n}} with \overline{(P 0 0 Q^{T})} = (Q 0 0 P^{T}) .

B^{n \times n} \times (B^{n \times n})^{op} = {(P 0 0 Q^{T}) (P 0 0 Q^{T}) P, Q \in B^{n \times n} P, Q \in B^{n \times n}} with \overline{(P 0 0 Q^{T})} = (Q 0 0 P^{T}) .

f \cdot (n_{g})_{g} = (n_{g f})_{g} .

f \cdot (n_{g})_{g} = (n_{g f})_{g} .

ψ (x) = (ψ (g x))_{g} .

ψ (x) = (ψ (g x))_{g} .

ψ (\overline{x}) = (ψ (g \overline{x}))_{g} = \overline{(ψ (g x))_{g}};

ψ (\overline{x}) = (ψ (g \overline{x}))_{g} = \overline{(ψ (g x))_{g}};

ψ (f x) = (ψ (g f x))_{g} = f \cdot ψ (x) .

ψ (f x) = (ψ (g f x))_{g} = f \cdot ψ (x) .

H_{w} = {f \in H ∣ f \in H f (w) = w f (w) = w} .

H_{w} = {f \in H ∣ f \in H f (w) = w f (w) = w} .

H_{uv} = {g \in H ∣ g (uv) = g (u) g (v) = uv} = H_{u} \cap H_{v},

H_{uv} = {g \in H ∣ g (uv) = g (u) g (v) = uv} = H_{u} \cap H_{v},

H_{\overline{u}} = {g \in H ∣ g (\overline{u}) = \overline{g (u)} = \overline{u}} = H_{u}

H_{\overline{u}} = {g \in H ∣ g (\overline{u}) = \overline{g (u)} = \overline{u}} = H_{u}

H_{g (u)} = {f \in H ∣ f (g (u)) = g (u)} = {f \in H ∣ g^{- 1} (f (g (u))) = u} = g^{- 1} H_{u} g

H_{g (u)} = {f \in H ∣ f (g (u)) = g (u)} = {f \in H ∣ g^{- 1} (f (g (u))) = u} = g^{- 1} H_{u} g

ν : A^{*} \to ST (H) = {H_{w} \in SG (H) ∣ H_{w} \in SG (H) w \in A^{*} w \in A^{*}} .

ν : A^{*} \to ST (H) = {H_{w} \in SG (H) ∣ H_{w} \in SG (H) w \in A^{*} w \in A^{*}} .

μ : M \to N

μ : M \to N

\overline{x_{1} \dots x_{m}}

\overline{x_{1} \dots x_{m}}

g \cdot (x_{1} \dots x_{m})

(B \cup Y)^{*} / {x = y ∣ x = y (x, y) \in θ (x, y) \in θ} .

(B \cup Y)^{*} / {x = y ∣ x = y (x, y) \in θ (x, y) \in θ} .

M (B, X, θ, μ) = (B \cup Y)^{*} / {x = y ∣ x = y (x, y) \in θ (x, y) \in θ} .

M (B, X, θ, μ) = (B \cup Y)^{*} / {x = y ∣ x = y (x, y) \in θ (x, y) \in θ} .

∣ uv ∣ + ∣ B ∣ + (x, y) \in θ \sum ∣ x y ∣ ⩽ n .

∣ uv ∣ + ∣ B ∣ + (x, y) \in θ \sum ∣ x y ∣ ⩽ n .

L = {(h (d_{1}), \dots, h (d_{k})) ∣ (h (d_{1}), \dots, h (d_{k})) h \in R h \in R} .

L = {(h (d_{1}), \dots, h (d_{k})) ∣ (h (d_{1}), \dots, h (d_{k})) h \in R h \in R} .

Sol (S) = {(σ (X_{1}), \dots, σ (X_{k})) \in A^{*} \times \dots \times A^{*} ∣ (σ (X_{1}), \dots, σ (X_{k})) \in A^{*} \times \dots \times A^{*} σ solves S σ solves S} .

Sol (S) = {(σ (X_{1}), \dots, σ (X_{k})) \in A^{*} \times \dots \times A^{*} ∣ (σ (X_{1}), \dots, σ (X_{k})) \in A^{*} \times \dots \times A^{*} σ solves S σ solves S} .

∥ S ∥ = ∣ H ∣ + ∣ A ∣ + k + s + 1 ⩽ i ⩽ s \sum ∣ U_{i} V_{i} ∣ .

∥ S ∥ = ∣ H ∣ + ∣ A ∣ + k + s + 1 ⩽ i ⩽ s \sum ∣ U_{i} V_{i} ∣ .

m_{S} (N) (2 + lo g ∣ A ∣) \cdot lo g ∥ S ∥ .

m_{S} (N) (2 + lo g ∣ A ∣) \cdot lo g ∥ S ∥ .

m (S) = m_{S} (N) where N is the monoid which appears in S

m (S) = m_{S} (N) where N is the monoid which appears in S

m_{S} (N_{1} \times N_{2}) ⩽ m_{S} (N_{1}) + m_{S} (N_{2}) .

m_{S} (N_{1} \times N_{2}) ⩽ m_{S} (N_{1}) + m_{S} (N_{2}) .

Sol (S) = {(h (d_{1}), \dots, h (d_{k})) \in C^{*} \times \dots \times C^{*} ∣ (h (d_{1}), \dots, h (d_{k})) \in C^{*} \times \dots \times C^{*} h \in L (A_{S}) h \in L (A_{S})} .

Sol (S) = {(h (d_{1}), \dots, h (d_{k})) \in C^{*} \times \dots \times C^{*} ∣ (h (d_{1}), \dots, h (d_{k})) \in C^{*} \times \dots \times C^{*} h \in L (A_{S}) h \in L (A_{S})} .

∣ H ∣ \cdot ∥ S ∥^{2} \cdot m (S) \cdot lo g ∣ A ∣ \cdot lo g ∥ S ∥

∣ H ∣ \cdot ∥ S ∥^{2} \cdot m (S) \cdot lo g ∣ A ∣ \cdot lo g ∥ S ∥

N^{'} = {m \in N ∣ m \in N \forall f \in K : f (m) = m \forall f \in K : f (m) = m} .

N^{'} = {m \in N ∣ m \in N \forall f \in K : f (m) = m \forall f \in K : f (m) = m} .

μ_{0} (X) \neq = f (μ_{0} (X)) = f (μ_{0} σ (X)) = μ_{0} (f (σ (X))) = μ_{0} σ (X) = μ_{0} (X),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Solutions to twisted word equations and equations in virtually free groups

Volker Diekert

Institut für Formale Methoden der Informatik, Universität Stuttgart, Universitätsstr. 38, D-70569 Stuttgart, Germany

[email protected]

and

Murray Elder

School of Mathematical and Physical Sciences, University of Technology Sydney, Broadway NSW 2007, Australia

[email protected]

(Date: March 7, 2024)

Abstract.

It is well known that the problem solving equations in virtually free groups can be reduced to the problem of solving twisted word equations with regular constraints over free monoids with involution. In this paper we prove that the set of all solutions of a twisted word equation is an EDT0L language whose specification can be computed in $\mathsf{PSPACE}$ . Within the same complexity bound we can decide whether the solution set is empty, finite, or infinite.

In the second part of the paper we apply the results for twisted equations to obtain in $\mathsf{PSPACE}$ an EDT0L description of the solution set of equations with rational constraints for finitely generated virtually free groups in standard normal forms with respect to a natural set of generators. If the rational constraints are given by a homomorphism into a fixed (or “small enough”) finite monoid, then our algorithms can be implemented in $\mathsf{NSPACE}(n^{2}\log n)$ , that is, in quasi-quadratic nondeterministic space.

Our results generalize the work by Lohrey and Sénizergues (ICALP 2006) and Dahmani and Guirardel (J. of Topology 2010) with respect to both complexity and expressive power. Neither paper gave any concrete complexity bound and the results in these papers are stated for subsets of solutions only, whereas our results concern all solutions.

2010 Mathematics Subject Classification: 03D05, 20F65, 20F70, 68Q25, 68Q45.

Keywords: Equation in a virtually free group, twisted equation, EDT0L language, $\mathsf{PSPACE}$ .

Research supported by Australian Research Council (ARC) Project DP 160100486 and German Research Foundation (DFG) Project DI 435/7-1.

Introduction

For a given semigroup $S$ the decision problem WordEquation is the following: on input two words $U$ and $V$ in variables together with letters from a generating set $\Sigma\subseteq S$ , decide whether or not there exists a substitution $\sigma$ of variables by elements in $S$ which yields a true identity $\sigma(U)=\sigma(V)$ in $S$ . Here, $\sigma$ is extended by $\sigma(s)=s$ for all $s\in\Sigma$ .

In a seminal paper [36] Makanin showed that WordEquation is decidable for free semigroups. The first complexity estimation of the problem was a tower of several exponential functions, but this dropped down to $\mathsf{PSPACE}$ by Plandowski [44] using compression. The insight that long solutions of word equations can be efficiently compressed is due to [45] which also led to the still standing conjecture that WordEquation is $\mathsf{NP}$ -complete for free semigroups (and free groups). Until 2013 the known decidability proofs for solving word equations were long and technical with an accompanied reputation for being difficult. This changed drastically when Jeż applied his recompression technique: he presented an $\mathsf{NSPACE}(n\log n)$ algorithm to solve word equations [26]111In [27] Jeż improved the complexity to $\mathsf{NSPACE}(n)$ .. Actually his method achieves more: it describes all solutions, copes with rational constraints (which is essential in applications), it extends to free groups, and to free monoids with involution [11]. Refining Jeż’s method, Ciobanu and the present authors showed that the full solution set of a given word equation over a free monoid with involution with rational constraints is EDT0L [5]. As a consequence, the same is true in free monoids without involution and for free groups where the constraints are used to ensure that solutions are given by reduced words. Previously this was only known for quadratic word equations [17]. EDT0L languages are defined by a certain type of Lindenmayer system. There is a vast literature on Lindenmayer systems, see [50], but all we need here is that an EDT0L language is specified by a nondeterministic finite automaton accepting endomorphisms over a free monoid and by some initial word. Applying the set of accepted endomorphisms to the initial word yields the language.

The original motivation for [5] was to prove that the full solution set in reduced words of equations in free groups is an indexed language, a problem which was open at that time [18, 25]. However, the result in [5] is stronger since EDT0L forms a strict subclass of indexed languages [15].

Transfer results as in [22, 5] from words to free groups have a long history. In the 1980s Makanin showed that the existential and positive theories of free groups are decidable [37]. In 1987 Razborov gave a description of all solutions for an equation in a free group via “Makanin-Razborov” diagrams [46, 47] which formed a cornerstone in the work of Kharlampovich/Myasnikov [30] and Sela [52] on the positive solution of Tarski’s conjectures about the elementary theory in free groups.

The motivation for the present paper is along this line. We show that given a finitely generated virtually free group $G$ there is an $\mathsf{NSPACE}(n^{2}\log n)$ algorithm which produces for a given equation with (small) rational constraints an effective description of an EDT0L language which describes the solution set in standard normal forms over a natural set of generators. Moreover, the same complexity is enough to decide whether the solution set is empty, finite or infinite. No $\mathsf{PSPACE}$ algorithm, in fact no concrete complexity bound was known for deciding emptiness before.

In this paper, we define an $\mathsf{NSPACE}(s(n))$ algorithm

to be a partially defined single-valued function $f$ computed by a nondeterministic Turing machine consisting of three tapes: a one-way-read-only input tape, a two-way-read-write work tape, and a one-way-write-only output tape. If the length of the word written on the input tape is $n$ , the work tape is restricted to having length $s(n)$ . If the machine halts on input $w$ at some point, then the contents $w^{\prime}$ on the output tape satisfies $w^{\prime}=f(w)$ . In general such a device might specify a partially defined multi-valued function, where several outputs are possible from the same input. However, in our case, we require that the output is unique. The domain of the partially defined function $f$ computed by the machine is the halting set of the machine, and for each $w$ in the domain there is a single output $f(w)$ . This is the standard definition of a nondeterministic transducer which computes partially defined single-valued function. For nondeterministic polynomial time, formal definitions go back to [4]; see also [53, 54]. It is clear that this formalism applies to other nondeterministic complexity classes as well. Every $\mathsf{NSPACE}(s(n))$ transducer can be simulated by a deterministic transducer using at most working space $s(n)^{2}$ (Savitch’s Theorem), and also by a deterministic Turing machine which uses a time bound in $2^{\mathcal{O}(s(n))}$ , see [43] for more details. Thus, every $\mathsf{PSPACE}$ algorithm can be implemented such that it runs in deterministic singly exponential time $2^{\text{poly}(n)}$ .

Several remarks are in order here, which point to some additional difficulties in our framework. First, in general virtually free groups have torsion, which is a serious obstacle to applying the known techniques. The reason to study virtually free groups is motivated by the ubiquitous presence of word hyperbolic groups [20]. Solving equations in torsion-free hyperbolic groups reduces to solving equations in free groups [49], but solving equations in word hyperbolic groups with torsion reduces to solving equations in virtually free groups which in turn reduces to solving twisted word equations with rational constraints [7]. The question of whether solving twisted word equations is decidable was asked by Makanin ([38] Problem 10.26(b)). It was solved in [7], thereby showing that the set of solvable equations over a f.g. virtually free group is decidable. This result was also independently shown by Lohrey and Sénizergues [35]. (Actually, [35] proves a more general transfer result.) What is in common: both papers are based on [10] and use explicitly ([35] only implicitly due to [10]) the so-called “exponent of periodicity”. Because of this neither paper describes all solutions, nor gives any concrete complexity bounds.

The result for virtually free groups is obtained by a reduction to the problem to describe the solution sets of twisted word equations with regular constraints, following standard techniques. So, the main new contribution is our approach to solving twisted word equations. We follow the approach in [5] to define a sound and complete algorithm to produce an NFA $\mathcal{A}$ describing all solutions, however in the setting of twisted equations the technical details are quite far from previous methods. For example, for readers familiar with previous methods, in twisted equations it does not make sense to “uncross” pairs $ab$ where $a$ , $b$ are different letters because once all pairs $ab$ are uncrossed the twisting may produce new crossing pairs $ba$ , uncrossing them leads to new crossing pairs $ab$ etc. Thus, our underlying method is quite different from the original recompression due to Jeż.

The class of f.g. virtually free groups appears in many different ways. For example, a fundamental theorem of Muller and Schupp (relying originally on [14]) says that a f.g. group is virtually free if and only if it is context-free [40]. This means that, given any set of monoid generators $A$ , the set of words $w\in A^{*}$ which represent $1\in V$ forms a context-free language. Other characterizations include: (1) fundamental groups of finite graphs of finite groups [29], (2) f.g. groups having a Cayley graph with finite treewidth [32], (3) universal groups of finite pregroups [48], (4) groups having a finite presentation by some geodesic string rewriting system [19], and (5) f.g. groups having a Cayley graph with decidable monadic second-order theory [32]. Proofs for the most important equivalences are in [13]. The transformations are effective. For example, starting from a context-free grammar for the word problem, we can construct a representation as a fundamental group of finite graphs of finite groups. However, the finite graphs of finite groups can be much larger than the size of the context-free grammar: the result in [55] showed a primitive recursive bound on the blow-up. It was only very recently that Sénizergues and Weiß showed in [57] that the blow-up can be bounded by a doubly exponential function.

What we use here is another characterization which is proved in [13, Sec. 2.4.5]. It follows rather easily from Bass-Serre theory [58] and the representation of a f.g. virtually free group as a fundamental group of finite graphs of finite groups. The characterization says that a f.g. group $G$ is virtually free if and only if it has an effective embedding into a semi-direct product of a free group $F$ with basis $E_{+}$ by a finite group $H$ which acts by permutations on the symmetric set $E=E_{+}\cup{E}^{-1}_{+}$ . (The precise statement is in Prop. 14.7.) Taking this characterization as a black box, no knowledge in Bass-Serre theory is required to understand our results.

An extended abstract of a preliminary version of this paper was presented at the conference ICALP 2017, Warsaw (Poland), 10-14 July 2017 [9]. Ciobanu and the second author have now extended the results of the present paper to show solutions to equations in any hyperbolic group are EDT0L with description in $\mathsf{PSPACE}$ [6].

1. Organization of the paper

1.1. The overall structure

The paper has two main and separate parts. In a first part we deal the following algorithmic problem. The input is a system $\mathcal{S}$ of twisted word equations with regular constraints over a free monoid (with involution) $A^{*}$ . The “twist” comes from a finite group $H$ acting on $A^{*}$ . We present a $\mathsf{PSPACE}$ algorithm which constructs an NFA $\mathcal{A}_{\mathcal{S}}$ which gives a description of the set of all solutions as an EDT0L language. Structural properties of the NFA $\mathcal{A}_{\mathcal{S}}$ tell us whether the set of all solutions is empty, finite, or infinite. Precise complexity bounds are discussed in Sect. 4.2, and under certain assumptions on the size of the regular constraints, we prove the entire algorithm can be done in $\mathsf{NSPACE}(|H|n^{2}\log|A|\log n)$ where $n$ denotes the input size of $\mathcal{S}$ , and when $|A|$ and $|H|$ are constants, this becomes quasi-quadratic nondeterministic space $\mathsf{NSPACE}(n^{2}\log n)$ .

In a second part we apply our results on twisted word equations to the existential theory of equations with rational constraints over finitely generated virtually free groups. From the algorithmic viewpoint we deal with a non-uniform complexity where the virtually free group $G$ is not part of the input. (This allows us to assume that $G$ is embedded into a semi-direct product of a free group $F$ by a finite group $H$ , where the rank of $F$ and the size of $H$ are constants.) The input is a Boolean formula $\Phi$ in free variables over equations and rational constraints for the solution specified by NFAs which accepts subsets of $G$ . The output is a specification of the set of all solutions in standard normal forms for $\Phi$ as an EDT0L language. The proof is a reduction to the setting in the first part. The result is in the same overall complexity as in the first part when taking into account that $H$ , $F$ and $A$ are not part of the input.

In the final section we perform this reduction explicitly for the special linear group $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ (without relying on any knowledge of Bass-Serre theory) starting with the well-known classical fact that $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ can be embedded in semi-direct product of a free group of rank $2$ (its commutator subgroup) by the finite cyclic group $\mathbb{Z}/12\mathbb{Z}$ . A priori, there could be an exponential blow-up in the complexity due to fact that we use matrices and not a word representation when describing equations over $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ . However, there is no such blow-up thanks to work of Gurevich and Schupp [21].

1.2. Technical details

We assume that the reader is familiar with some basic facts in combinatorics on words, formal languages and finite automata, and complexity theory. Apart from that (and the promise that a finitely generated (f.g. for short) virtually free group admits an embedding into a certain semi-direct product of a free group $F$ by a finite group $H$ ) the paper is self-contained. In principle, it is not necessary that the reader has ever heard of Makanin, word equations, or any method to solve them before. The paper uses various technical tools where the authors would have preferred to give references in the literature rather than lengthy and somewhat pedestrian constructions, but failed to find the appropriate references.

The heart of the paper is Jeż’s compression method in the framework of twisted equations: Sect. 10 and Sect. 11. The adaption to the twisted setting is far from trivial and quite different from the original method in [26] or its extension to free groups as in [11] or [5]. Therefore to understand Sections 10 and 11 is the most demanding part when reading the paper.

Many of the technicalities surrounding $\mathsf{NSPACE}$ complexity can be overlooked if the reader is happy enough to replace the explicit complexity bounds by $\mathsf{PSPACE}$ .

2. Preliminaries

We use standard notation. If $A$ and $B$ are sets, then $A\subseteq B$ means set inclusion, while $A\varsubsetneq B$ means $A\subseteq B\wedge A\neq B$ . By $A\setminus B$ we denote the set of $a\in A$ which are not in $B$ . By $B^{A}$ we mean the set of mappings from $A$ to $B$ , and $2^{A}$ denotes the power set of $A$ , that is, $2^{A}=\left\{B\mathrel{\left|\vphantom{B}\vphantom{B\subseteq A}\right.}B\subseteq A\right\}$ . We also view $2^{A}$ as a commutative and idempotent monoid.

$\mathbb{B}=(\{0,1\},\max,{\cdot},0,1),$ denotes the Boolean semi-ring, $\mathbb{N}$ (resp. $\mathbb{Z}$ ) denotes the semi-ring of natural numbers, (resp. the ring of integers). $\mathbb{N}$ is also the free monoid and $\mathbb{Z}$ is the free group in one generator.

Monoids (resp. groups) will typically be denoted by $M$ and $N$ (resp. by $G$ and $H$ ). If the focus is on finite monoids (resp. finite groups), then we use the notation $N$ (resp. $H$ ). With a few exceptions (like $\mathbb{N}$ or $\mathbb{Z}$ ) we denote the identity element in monoids by $1$ . A zero in a monoid $M$ is an element $0\in M$ such that $0x=x0=0$ for all $x\in M$ . If a zero exists, it is unique. Nontrivial groups cannot have a zero.

Let $M$ be a monoid and $u,v\in M$ . We say that $u$ is a factor of $v$ if we can write $v=xuy$ for some $x,y\in M$ . If we can write $v=uy$ (resp. $v=xu$ ), then we say that $u$ is a prefix (resp. suffix) of $v$ . If $u$ is a prefix of $v$ , then we also write $u\leqslant v$ .

2.1. Complexity

The $\mathcal{O}$ -notation and complexity classes like $\mathsf{P}$ , $\mathsf{NP}$ , $\mathsf{PSPACE}$ , $\mathsf{NSPACE}(s(n))$ are defined as in standard textbooks ([24, 43]), see also page Introduction. We use a convention that $\log(m)=\max\{1,\log_{2}(m)\}$ . Throughout we use the well-known fact due to Immerman-Szelepcsényi that $\mathsf{NSPACE}(s(n))$ is closed under complementation. Note that any statement about complexity depends on how the input is given. A statement like “factorization is not known to be in $\mathsf{P}$ ” makes sense only if the encoding of the problem and a notion of input size has been defined. If integers are encoded in unary, then, trivially, “factorization is in $\mathsf{P}$ ” is true. Typically, inputs have various parameters. If certain parameters of the input are fixed, then with respect to the input size these parameters behave as constants. Still, for many problems $\mathcal{P}$ it is more accurate to use parametrized inputs, where the input size is tuple of non-negative numbers: $\left\|\mathinner{\mathcal{P}}\right\|=(p_{1},\ldots,p_{k})$ with $k\geqslant 1$ . If $\mathcal{P}$ is such a problem, then with respect to polynomial resource bounds we view $\mathcal{P}$ as a one-parameter problem of input size $n=1+p_{1}+\cdots+p_{k}$ . The notation is robust: every polynomial in $(1+p_{1})^{\ell}\cdots(1+p_{k})^{\ell}$ is also a polynomial in $n$ for all $\ell\geqslant 1$ . Throughout, we take care to define our input sizes (of systems of equations, or Boolean formulae, with regular constraints) in a natural way.

2.2. Sets and monoids with involution

An involution of a set is a bijection $x\mapsto\overline{x}$ such that $\overline{\overline{x}}=x$ for all $x$ in the set. The identity map is an involution. A monoid with involution additionally has to satisfy $\overline{xy}=\overline{y}\,\overline{x}$ . This implies $\overline{1}=1$ and $\overline{0}=0$ (in case there is a zero). If $G$ is a group, then it is a monoid with involution by taking $\overline{g}={g}^{-1}$ for all $g\in G$ . By default, we choose $\overline{g}$ to be ${g}^{-1}$ in groups.

A morphism between sets with involution is a mapping respecting the involution. A morphism between monoids with involution is a homomorphism $\varphi\colon M\to M^{\prime}$ such that $\varphi(\overline{x})=\overline{\varphi(x)}$ . Note that every group homomorphism is a morphism of monoids with involution. The set of automorphisms on a set (or monoid) $M$ forms the group $\operatorname{Aut}(M)$ . For $\Delta\subseteq M\cap M^{\prime}$ we say that $\varphi\colon M\to M^{\prime}$ is a $\Delta$ -morphism if $\varphi(x)=x$ for all $x\in\Delta$ .

2.3. Group actions and $H$ -monoids

Recall that a group $H$ acts on a set $\Sigma$ (with involution) via a homomorphism $\psi\colon H\to\operatorname{Aut}(\Sigma)$ . That is, $\psi$ defines a permutation $x\mapsto g\cdot x$ with $1\cdot x=x$ , $f\cdot(g\cdot x)=(fg)\cdot x$ (and $f\cdot\overline{x}=\overline{f\cdot x}$ ) for all $f,g\in{H}$ and $x\in\Sigma$ . Thus, every $g\in{H}$ defines a permutation of $\Sigma$ (which respects the involution). The stabilizer of $x\in\Sigma$ is the subgroup $H_{x}=\left\{g\in H\mathrel{\left|\vphantom{g\in H}\vphantom{g\cdot x=x}\right.}g\cdot x=x\right\}$ . Frequently, we also write $g(x)$ as a synonym for $g\cdot x$ . If $H$ acts on a monoid $M$ , then we additionally demand that every element of $H$ acts as an automorphism: $g(xy)=g(x)g(y)$ . If $M$ is equipped with an involution, then we have $g(\overline{xy})=g(\overline{y}){g(\overline{x})}=\overline{g(y)}\;\overline{g(x)}$ . In the following we say that $M$ is an $H$ -monoid if it is a monoid with involution on which $H$ acts. A morphism between $H$ -monoids $M$ and $M^{\prime}$ is given by an $H$ -compatible morphism which is a homomorphism $\varphi\colon M\to M^{\prime}$ respecting for all $g\in H$ and $x\in M$ the action $g\cdot\varphi(x)=\varphi(g\cdot x)$ , and the involution, $\varphi(\overline{x})=\overline{\varphi(x)}$ .

2.4. Free monoids with involution and an $H$ -action

By an alphabet we mean a finite set $\Sigma$ with involution. (Since the identity is an involution of $\Sigma$ this covers the case of monoids without a predefined involution.) The elements of $\Sigma$ are called letters (or symbols). By $\Sigma^{*}$ (by $\Sigma^{+}$ resp.) we denote the free monoid (free semigroup resp.) over $\Sigma$ . The elements of a free monoid are called words. The empty word in a free monoid is also denoted by $1$ as in other monoids. We have $\Sigma^{+}=\Sigma^{*}\setminus\left\{\mathinner{1}\right\}$ . The involution extends to $\Sigma^{*}$ : for a word $w=a_{1}\cdots a_{m}$ with $a_{i}\in\Sigma$ we let $\overline{w}=\overline{a_{m}}\cdots\overline{a_{1}}$ . The monoid $\Sigma^{*}$ is called the free monoid with involution over $\Sigma$ . If $\overline{a}=a$ for all $a\in\Sigma$ then $\overline{w}$ is simply the word $w$ read from right-to-left. The length of word $w$ is denoted by $\left|\mathinner{w}\right|$ , and ${\left|\mathinner{w}\right|}_{a}$ counts how often a letter $a$ appears in $w$ .

If a group $H$ acts on $\Sigma$ , then $g\in H$ acts on $w=a_{1}\cdots a_{m}$ with $a_{i}\in\Sigma$ by

[TABLE]

A letter $a\in\Sigma$ is $H$ -visible in $w$ if $g(a)$ is a factor of $w$ for some $g\in{H}$ .

Sometimes it is useful to view a word of $w=a_{1}\cdots a_{m}$ with $a_{i}\in\Sigma$ as a labeled linear order as follows. We let $\left\{\mathinner{1,\ldots,m}\right\}$ be set of positions; and we label a position $1\leqslant p\leqslant m$ with the letter $w[p]=a_{p}$ . For $1\leqslant i,j\leqslant m$ we denote by $[i,j]$ the interval $\left\{\mathinner{i,\ldots,j}\right\}$ . The labels of the interval define a factor

[TABLE]

An occurrence of a factor $u$ in $w$ is an interval $[i,j]$ such that $u=w[i,j]$ . Typically, a factor has several occurrences. For a position $1\leqslant p\leqslant|w|$ we define its dual position $\overline{p}$ by $\overline{p}=m+1-p$ . The notion of duality extends to intervals $[i,j]$ with $1\leqslant i,j\leqslant|w|$ by $\overline{[i,j]}=[\overline{j},\overline{i}]$ . Thus, the set of intervals is a set with involution.

A word $w$ such that $w=\overline{w}$ is called self-involuting, and for such $w$ we have $\overline{w[i,j]}=w[\overline{j},\overline{i}]$ .

2.5. Automata, rational and recognizable subsets in a monoid

For notation and results in the this subsection we refer to the classical textbook [16]. A regular language in finitely generated free monoids can be defined via a nondeterministic finite automaton or via recognizability using a homomorphism to a finite monoid, to mention just two possible definitions. We need the corresponding notions subsets for other monoids, too.

Let $M$ be any monoid (not necessarily equipped with an involution). A nondeterministic automaton over $M$ is a directed arc-labeled graph $\mathcal{A}$ denoted as a tuple $\mathcal{A}=(Q,M,\delta,\mathscr{I},\mathscr{F})$ . The vertices of $\mathcal{A}$ form the set $Q$ of states, with subsets $\mathscr{I}$ of initial and $\mathscr{F}$ final states. We write $\mathcal{A}=\emptyset$ if there are no states. The arcs are called transitions and they are labeled with elements of the monoid $M$ . We represent the set of transitions $\delta$ as a subset of $Q\times M\times Q$ . A transition labeled by $1\in M$ is called an $\varepsilon$ -transition. In pictures we draw a transition $(p,h,q)$ as $p\overset{h}{\longrightarrow}q$ . We say that $m\in M$ is accepted by the automaton $\mathcal{A}$ if there exists a path from some initial to some final state such that multiplying the labels together yields $m$ . This defines the accepted language $L(\mathcal{A})=\left\{m\in M\mathrel{\left|\vphantom{m\in M}\vphantom{m\text{ is accepted by }\mathcal{A}}\right.}m\text{ is accepted by }\mathcal{A}\right\}$ .

Often we specify $M$ together with a set $\Sigma$ of generators or, more generally, together with homomorphism $\pi$ from the free monoid $\Sigma^{*}$ to $M$ . In that case, we may denote $\mathcal{A}$ alternatively as $\mathcal{A}=(Q,\Sigma,\delta,\mathscr{I},\mathscr{F})$ where $\delta\subseteq Q\times\Sigma^{*}\times Q$ . This allows two natural interpretations of $L(\mathcal{A})$ : first as the set of words $L(\mathcal{A})\subseteq\Sigma^{*}$ obtained by reading $\mathcal{A}$ as a shorthand for $(Q,\Sigma^{*},\delta,\mathscr{I},\mathscr{F})$ ; second as $L(\mathcal{A})\subseteq M$ by identifying $L(\mathcal{A})\subseteq\Sigma^{*}$ with $\pi(L(\mathcal{A}))$ . If the distinction is crucial, we write $L(\mathcal{A})$ and $\pi(L(\mathcal{A}))$ . However, sometimes a sloppy notation $L(\mathcal{A})\subseteq M$ is used. There will be however no risk of confusion.

A subautomaton $\mathcal{A}^{\prime}$ of $\mathcal{A}=(Q,M,\mathscr{I},\mathscr{F})$ is an automaton $\mathcal{A}^{\prime}=(Q^{\prime},M,\mathscr{I}^{\prime},\mathscr{F}^{\prime})$ such that $Q^{\prime}\subseteq Q$ , $\delta^{\prime}\subseteq\delta$ , $\mathscr{I}^{\prime}\subseteq\mathscr{I}$ , and $\mathscr{F}^{\prime}\subseteq\mathscr{F}$ .

An automaton is called trim if every state is on some path from an initial to a final state. For a trim automaton $\mathcal{A}$ we have $L(\mathcal{A})\neq\emptyset$ if and only if $\mathcal{A}\neq\emptyset$ . Clearly, every automaton $\mathcal{A}$ contains a trim subautomaton $\mathcal{A}^{\prime}$ such that $L(\mathcal{A}^{\prime})=L(\mathcal{A})$ .

If the set of transitions is finite, then we call $\mathcal{A}$ a nondeterministic finite automaton (or NFA for short). A subset $L\subseteq M$ is called rational if $L$ is accepted by some NFA over $M$ .

A subset $L\subseteq M$ is called recognizable if there is a homomorphism $h\colon M\to N$ to a finite monoid $N$ such that ${h}^{-1}(h(L))=L$ . In case that $M$ is a finitely generated free monoid, the notion of rational and recognizable subsets coincide; so these subsets are also called regular. It follows rather easily that a monoid $M$ is finitely generated if and only if all recognizable subsets are rational [39]. Finite subsets are always rational, but finite subsets in a group are recognizable if and only if the group is finite.

2.6. From NFAs to Boolean matrices

Nondeterministic finite automata encode regular languages in a concise and natural way. It is convenient to work in an algebraic framework with recognizing morphisms, too. Let us recall a well-known and classical construction.

Let $\mathcal{A}=(Q,{\Sigma},\delta,\mathscr{I},\mathscr{F})$ be any NFA with $m$ states. Then we can assume that $Q=\left\{\mathinner{1,\ldots,m}\right\}$ , and we represent transitions as a mapping to Boolean $m\times m$ matrices as follows. For each letter $a\in\Sigma$ we define a matrix $\mu_{\mathcal{A}}(a)\in\mathbb{B}^{m\times m}$ by

[TABLE]

We obtain a homomorphism $\mu_{\mathcal{A}}\colon{\Sigma}^{*}\to\mathbb{B}^{m\times m}$ such that for all $w\in\Sigma^{*}$ we have

[TABLE]

Example 2.1.

Let ${\Sigma}$ be an alphabet (with involution) and $H\leqslant\operatorname{Aut}({\Sigma})$ be a subgroup of automorphisms. The set $R$ of words having a factor $e\overline{e}$ for some $e\in{\Sigma}$ is regular and $R$ is invariant under the action of $H$ . Let $\mathbb{F}={\Sigma}^{*}\setminus R$ be the complement: it is the set of reduced words. The set $\mathbb{F}$ is in canonical bijection with the group ${\Sigma}^{*}/\left\{e\overline{e}=1\mathrel{\left|\vphantom{e\overline{e}=1}\vphantom{e\in{\Sigma}}\right.}e\in{\Sigma}\right\}$ . The language $R$ is accepted by an NFA (actually a DFA) with $1+\left|\mathinner{\Sigma}\right|$ states. Hence, $\mathbb{B}^{m\times m}$ recognizes them where $m=1+\left|\mathinner{\Sigma}\right|$ .

The size of $\mathbb{B}^{m\times m}$ is $2^{m^{2}}$ , but there is a much smaller monoid $N$ which recognizes $R=\bigcup\left\{\Sigma^{*}e\overline{e}\Sigma^{*}\mathrel{\left|\vphantom{\Sigma^{*}e\overline{e}\Sigma^{*}}\vphantom{e\in\Sigma}\right.}e\in\Sigma\right\}$ and hence $\mathbb{F}$ , too. The elements of $N$ are $1$ , [math], and the pairs $(a,b)$ in ${\Sigma}\times{\Sigma}$ . The elements $1$ and [math] act as the neutral element and a zero, respectively. The multiplication for the other elements is given by $(a,b)\cdot(c,d)=(a,d)$ if $b\neq\overline{c}$ and $(a,b)\cdot(c,d)=0$ , otherwise. The involution is given by $\overline{(a,b)}=(\overline{b},\overline{a})$ . In effect, $N$ “remembers” the first and last letters of elements in $\mathbb{F}$ . A pair $(a,b)$ switches to [math] once a factor $e\overline{e}$ is recorded.

It is an $H$ -monoid by the natural action of $H$ induced by the action of $H$ on ${\Sigma}$ . Consider the morphism of $H$ -monoids $\mu\colon{\Sigma}^{*}\to N$ which is defined by $\mu(a)=(a,a)$ . Then we have $R={\mu}^{-1}(0)$ and $\mathbb{F}={\mu}^{-1}(N\setminus\left\{\mathinner{0}\right\})$ . The size of $N$ is therefore $2+{\left|\mathinner{\Sigma}\right|}^{2}-\left|\mathinner{\Sigma}\right|$ . Therefore each element in $N$ can be specified by at most $1+\log_{2}(1+\left|\mathinner{\Sigma}\right|)$ bits.

3. Regular languages in presence of an involution and an $H$ -action

The application of this section is stated precisely in Prop. 3.2. We give a construction which allows us to handle regular constraints for twisted word equations using morphisms to finite $H$ -monoids.

We proceed in two steps. The first step forces homomorphisms to respect the involution. This part is from the arXiv version of [5] which was inspired by [10]. We repeat that construction. Let $N$ be any monoid. We define its dual monoid $N^{\mathrm{op}}$ to use the same set $N^{\mathrm{op}}=N$ , but $N^{\mathrm{op}}$ is equipped with a new multiplication $x\circ y=yx$ . In order to indicate whether we view an element in the monoid $N$ or $N^{\mathrm{op}}$ , we use a flag: for $x\in N$ we write $x^{\mathrm{op}}$ to indicate the same element in $N^{\mathrm{op}}$ . Thus, we can suppress the symbol $\circ$ and we simply write $x^{\mathrm{op}}y^{\mathrm{op}}=(yx)^{\mathrm{op}}$ . The notation is intended to mimic transposition in matrix calculus, where the dual operation is just the transpose. Similarly, we write $1$ instead of $1^{\mathrm{op}}$ which is true for the identity matrix as well. The direct product $N\times N^{\mathrm{op}}$ becomes a monoid with involution by letting $\overline{(x,y^{\mathrm{op}})}=(y,x^{\mathrm{op}})$ . Indeed,

[TABLE]

The following observations are immediate.

•

If $N$ is finite then $N\times N^{\mathrm{op}}$ is finite, too.

•

We can embed $N$ into $N\times N^{\mathrm{op}}$ by a homomorphism $\iota\colon N\to N\times N^{\mathrm{op}}$ defined by $\iota(x)=(x,1)$ . Note that if $\eta\colon N\times N^{\mathrm{op}}\to N$ denotes the projection onto the first component, then $\eta\iota=\mathrm{id}_{N}$ .

•

If $M$ is a monoid with involution and $\rho\colon M\to N$ is a homomorphism of monoids, then we can lift $\rho$ uniquely to a morphism $\varphi^{\text{op}}\colon M\to N\times N^{\mathrm{op}}$ of monoids with involution such that we have $\rho=\eta\varphi^{\text{op}}$ . Indeed, it is sufficient and necessary to define $\varphi^{\text{op}}(x)=(\rho(x),\rho(\overline{x})^{\mathrm{op}})$ .

Example 3.1 ([10]).

Let $M=\mathbb{B}^{n\times n}$ . Then $M\times M^{\mathrm{op}}$ is a submonoid of the set of $2n\times 2n$ -Boolean matrices:

[TABLE]

In the line above $P^{T}$ and $Q^{T}$ are the transposed matrices.

Now, we switch to the new part of our construction. For readers familiar with wreath products it might be helpful to say that the following is a wreath product construction. Let $N$ be a monoid with involution. Consider the direct product $N^{H}$ , which is the set of maps from $H$ to $N$ . We denote the elements of $N^{H}$ by tuples $(n_{g})_{g}$ with the interpretation that $g\in H$ is mapped to $n_{g}\in N$ . It is a monoid by pointwise multiplication with involution $\overline{(n_{g})_{g}}=(\overline{n_{g}})_{g}$ . The monoid $N$ embeds into $N^{H}$ by sending $n$ to the constant map $(n)_{g}$ . We let act $H$ on $N^{H}$ by

[TABLE]

Now, let $M$ be an $H$ -monoid with involution and let $\psi\colon M\to N$ be a morphism of monoids with involution, then we extend it to $\widetilde{\psi}\colon M\to N^{H}$ by

[TABLE]

The homomorphism $\widetilde{\psi}$ respects the involution since

[TABLE]

and it respects the action of $H$ since

[TABLE]

Moreover, $\psi$ factorizes though $\widetilde{\psi}$ because $\widetilde{\psi}(x)=(\psi(gx))_{g}$ implies $\psi=\eta_{1}\widetilde{\psi}$ where $\eta_{1}((n_{g})_{g})=n_{1}$ .

If we start with a homomorphism $\varphi$ from an $H$ -monoid with involution $M$ to a monoid $N$ without involution, then $\psi$ means the morphism $\varphi^{\text{op}}\colon M\to N\times N^{\mathrm{op}}$ ; and $\widetilde{\varphi}$ is a shorthand for $\widetilde{\varphi^{\text{op}}}$ . Thus, the constructions above yield the commutative diagram as in Fig. 1. In that figure $\varphi=\eta\eta_{1}\widetilde{\varphi}$ is a homomorphism, $\varphi^{\text{op}}=\eta_{1}\widetilde{\varphi}$ is a morphism of monoids with involution, and $\widetilde{\varphi}$ is an $H$ -compatible morphism. As a direct consequence we obtain the following proposition. Recall that an $H$ -monoid is, by definition, a monoid with involution.

Proposition 3.2.

Let $H$ be a finite group that acts on a finite alphabet $A$ ; and hence via a length and involution preserving action on $A^{*}$ . Then all recognizable subsets of $A^{*}$ can be recognized by some $H$ -compatible morphism to a finite $H$ -monoid. More precisely, let $\varphi\colon A^{*}\to N$ be a homomorphism to a finite monoid; and $L={\varphi}^{-1}(F)$ for some $F\subseteq N$ . Then we have $L={\widetilde{\varphi}}^{-1}(\widetilde{F})$ where $\widetilde{F}={\widetilde{\eta}}^{-1}(F)$ and $\widetilde{\eta}=\eta\eta_{1}$ .

3.1. Stabilizers

Let $H$ be a finite group acting on an alphabet ${A}$ via a homomorphism $\psi\colon H\to\operatorname{Aut}({A})$ . We assume that $H$ is given by its multiplication table. The table can be stored with $\mathcal{O}(|H|^{2}\log|H|)$ bits. We also need a way to represent the action of $H$ and the stabilizer subgroups of $H$ . The action is recorded by writing down for each $f\in H$ the element $\psi(f)$ as a permutation of ${A}$ . To do this, we write $\psi(f)$ as a set of pairs $\psi(f)=\left\{(a,f(a))\mathrel{\left|\vphantom{(a,f(a))}\vphantom{a\in{A}}\right.}a\in{A}\right\}$ . Thus, the action of $H$ on ${A}$ can be stored with $\mathcal{O}(|H|\left|\mathinner{A}\right|\log|{A}|)$ bits.

For a word $w\in{{A}}^{*}$ we denote by $H_{w}$ its stabilizer:

[TABLE]

stabilizer are subgroups; and the set of subgroups of the form $H_{w}$ form a commutative monoid $\mathop{\text{ST}}(H)$ where the operation is intersection, the identity element is $H$ , and the involution is the identity. Indeed, we have

[TABLE]

and

[TABLE]

for all $u,v\in{{A}}^{*}$ , $g\in H$ . In particular, $H$ acts on $\mathop{\text{ST}}(H)$ by conjugation, and $\mathop{\text{ST}}(H)$ is therefore an $H$ -monoid. Let $\mathop{\text{SG}}(H)$ denote the set of all subgroups of $H$ , then $\nu(u)=H_{u}$ yields a canonical surjective morphism

[TABLE]

A basic test is to answer “ $f\in H_{w}$ ?”. This is easy: for $w=a_{1}\cdots a_{m}$ with $a_{i}\in{{A}}$ we check one after another that $f(a_{i})=a_{i}$ for $1\leqslant i\leqslant|w|$ . This enables an efficient test to decide whether or not $H_{u}\subseteq H_{v}$ . For each $f\in H$ one after another we test whether $f\in H_{u}$ implies $f\in H_{v}$ . In particular, we can answer “ $H_{u}=H_{v}$ ?”.

Lemma 3.3.

We have $\mathop{\text{ST}}(H)=\left\{H_{w}\mathrel{\left|\vphantom{H_{w}}\vphantom{w\in{{A}}^{*}\wedge|w|\leqslant\log_{2}|H|}\right.}w\in{{A}}^{*}\wedge|w|\leqslant\log_{2}|H|\right\}.$

Proof.

For each $w\in{{A}}^{*}$ and $b\in{{A}}$ we either have $H_{w}=H_{wb}$ or $|H_{w}|\geqslant 2|H_{wb}|$ because $H_{wb}=H_{w}\cap H_{b}$ . So, if they are not equal, then their intersection is a subgroup which has index at least $2$ in $H_{w}$ . ∎

The idea is therefore to use words of length at most $\log_{2}|H|$ to represent stabilizers and to perform the calculations for stabilizers on these words. The representation is not unique, but this does not matter for our application.

A main task is to compute a word $w$ of length at most $\log_{2}|H|$ such that $H_{w}=H_{u}\cap H_{v}$ (when $u$ and $v$ satisfy $|u|,|v|\leqslant\log_{2}|H|$ ). This can be done efficiently according to the following lemma.

Lemma 3.4.

Every element in the commutative monoid $\mathop{\text{ST}}(H)$ of stabilizers can be represented by a word in $A^{*}$ of length at most $\log|H|$ , thus with at most $\mathcal{O}(\log\left|\mathinner{H}\right|\cdot\log|{A}|)$ bits. Using this representation, multiplication (that is, intersection) and computing the $H$ -action (that is, conjugation), can be done in space $\mathcal{O}(\log\left|\mathinner{H}\right|\cdot\log|{A}|)$ .

Proof.

Let $uv=a_{1}\cdots a_{m}$ . We have to compute a word $w$ of length at most $\log_{2}|H|$ such that $H_{w}=H_{a_{1}\cdots a_{m}}$ .

We run a loop for $i=1$ to $m$ . At each step we have computed a word $u_{i-1}$ such that $H_{u_{i-1}}=H_{a_{1}\cdots a_{i-1}}$ with the invariant $2^{|u_{i-1}|}|H_{u_{i-1}}|\leqslant|H|$ (initially we let $u_{0}=1$ ). There are exactly three mutually disjoint cases.

(1)

If $H_{u_{i-1}}\subseteq H_{a_{i}}$ then we let $u_{i}=u_{i-1}$ . 2. (2)

If $H_{a_{i}}\varsubsetneq H_{u_{i-1}}$ , then we let $u_{i}=a_{i}$ . 3. (3)

If $H_{a_{i}}$ and $H_{u_{i-1}}$ are incomparable with respect to containment, then we let $u_{i}=u_{i-1}a_{i}$ .

Each case keeps the invariant because, by induction, $2^{|u_{i}|}|H_{u_{i}}|\leqslant|H|$ . ∎

Remark 3.5.

The reader can easily check that our computation of a word $w$ with $H_{w}=H_{a_{1}\cdots a_{m}}$ yields a word $w$ of pairwise different letters. So, we could actually put a bound $|w|\leqslant\min\left\{\mathinner{\log_{2}|H|,|A|}\right\}$ on its length.

3.2. $H$ - $N$ -monoids

In the following $H$ denotes a finite group and $N$ denotes a finite $H$ -monoid. Let $M$ be a set (resp. be a monoid) with involution and

[TABLE]

be a morphism. We say $M$ (together with $\mu$ ) is an $H$ - $N$ -alphabet (resp. an $H$ - $N$ -monoid) if $H$ acts on $M$ such that $\mu(g\cdot x)=g\cdot\mu(x)$ . For example, the identity map on $N$ makes $N$ itself to an $H$ - $N$ -monoid.

A morphism between $H$ - $N$ -monoids is an $H$ -compatible morphism $\varphi\colon M^{\prime}\to M$ such that $\mu\varphi=\mu^{\prime}$ . Thus, if $M$ is an $H$ - $N$ -monoid and $M^{\prime}$ is a monoid with involution where $H$ acts, then every $H$ -compatible morphism $\varphi\colon M^{\prime}\to M$ turns $M^{\prime}$ into an $H$ - $N$ -monoid where $\mu^{\prime}$ is uniquely defined by the equation $\mu\varphi=\mu^{\prime}$ . The use of $H$ - $N$ -monoids is natural in our setting: the $H$ -action is due to a group action on letters, and the finite monoid $N$ is used for the specification of rational constraints. It is clear that the specification of constraints has to be compatible with the group action.

3.3. Free $H$ - $N$ -monoids and types

Let $B$ and $\mathcal{Y}$ be two disjoint $H$ - $N$ -alphabets. We call $B$ the alphabet of constants and $\mathcal{Y}$ the set of twisted variables. The free monoid with involution $(B\cup\mathcal{Y})^{*}$ becomes an $H$ - $N$ -monoid where $\mu\colon(B\cup\mathcal{Y})^{*}\to N$ is induced by $B\cup\mathcal{Y}$ .

[TABLE]

By $\theta\subseteq(B\cup\mathcal{Y})^{*}\times(B\cup\mathcal{Y})^{*}$ we denote a finite homogeneous relation. Here as usual, a relation is called homogeneous if $(x,y)\in\theta$ implies $|x|=|y|$ . If $(x,y)\in\theta$ then we also say that $(x,y)$ is a defining relation because the algebraic object we are interested in is the quotient monoid

[TABLE]

We need more structure of this quotient monoid; in particular, $\mu\colon(B\cup\mathcal{Y})^{*}\to N$ should induce a morphism of $H$ - $N$ -monoids. Actually we wish more, therefore we impose the following technical restrictions on $\theta$ ; and then we call $\theta$ a type (and for a variable $X$ we also define the type of $X$ denoted $\theta(X)$ below).

(1)

$(x,y)\in\theta$ implies $\mu(x)=\mu(y)$ , $(\overline{y},\overline{x})\in\theta$ , and $(f(x),f(y))\in\theta$ for all $f\in H$ , even if these relations are not listed in the specification of $\theta$ . 2. (2)

If a (twisted) variable $X$ appears in $\theta$ (that is $|xy|_{X}\geqslant 1$ for some $(x,y)\in\theta$ ), then we call $X$ typed. For a typed variable $X$ we require that there is a unique primitive word222Recall that $p$ is primitive if and only if it cannot be written as $p=r^{e}$ with $e\geqslant 2$ . $p\in B^{*}$ such that $(Xp,\,pX)\in\theta$ . We define $\theta(X)=p$ , and say that $\theta(X)$ is the type of $X$ . 3. (3)

For $(x,y)\in\theta$ we allow exactly three possibilities:

(i)

$(x,y)=(ab,ca)$ with $a,b,c\in B$ . 2. (ii)

$(x,y)=(X\,\theta(X),\,\theta(X)\,X)$ for variables $X$ . 3. (iii)

$(x,y)=(Xa,aY)$ where $a\in B$ and $X,Y$ are typed variables such that $X\neq Y$ .

It is convenient to choose a subset $\mathcal{X}\subseteq\mathcal{Y}$ which is closed under the involution such that every $Y\in\mathcal{Y}$ has the form $Y=f(X)$ for some $X\in\mathcal{X}$ and $f\in H$ . In the following, by a variable we typically mean $X\in\mathcal{X}$ and thus, every twisted variable $Y\in\mathcal{Y}$ can be written as $f\cdot X$ for some $f\in H$ and $X\in\mathcal{X}$ . We assume $X\neq\overline{X}$ for all variables. Having chosen $\theta$ and $\mathcal{X}$ we denote by $M(B,\mathcal{X},\theta,\mu)$ the following quotient monoid (and an $H$ - $N$ -monoid with type $\theta$ ):

[TABLE]

Point (1) from above makes sure that one can extend the involution, the morphism $\mu$ and the action of $H$ to the quotient $M(B,\mathcal{X},\theta,\mu)$ . The homogeneity condition for $\theta$ makes it possible to solve the uniform word problem in $M(B,\mathcal{X},\theta,\mu)$ in nondeterministic quasi-linear space:

Lemma 3.6.

There is an $\mathsf{NSPACE}(n\log n)$ algorithm which performs the following task. The input is an alphabet $B$ , a homogeneous relation $\theta\subseteq B^{*}\times B^{*}$ , and two words $u,v\in B^{*}$ such that

[TABLE]

The output is “yes” if $u=v$ in the quotient monoid $B^{*}/\left\{x=y\mathrel{\left|\vphantom{x=y}\vphantom{(x,y)\in\theta}\right.}(x,y)\in\theta\right\}$ and “no” otherwise.

Proof.

If $u=v$ in $B^{*}/\left\{x=y\mathrel{\left|\vphantom{x=y}\vphantom{(x,y)\in\theta}\right.}(x,y)\in\theta\right\}$ , then nondeterministically we can apply rewriting rules from $\theta$ (which preserve length) to $u$ until we see $u=v$ in the free monoid $B^{*}$ . We get the “no” answer because $\mathsf{NSPACE}(n\log n)$ is closed under complementation by the theorem of Immerman-Szelepcsényi, see for example [43]. ∎

Using Lem. 3.6 we represent elements in $M(B,\mathcal{X},\theta,\mu)$ by words over $(B\cup\mathcal{Y})^{*}$ . For $\theta=\emptyset$ we obtain $(B\cup\mathcal{Y})^{*}=M(B,\mathcal{X},\emptyset,\mu)$ . By $M(B,\theta,\mu)$ we denote the $H$ - $N$ -monoid submonoid with type $\theta$ which is generated by $B$ . In particular, $B^{*}=M(B,\emptyset,\mu)$ . If $\theta\cap B^{*}\times B^{*}=\emptyset$ , then $M(B,\emptyset,\mu)=B^{*}$ is a free submonoid of $M(B,\mathcal{X},\theta,\mu)$ . If $H$ acts without fixed points on $\mathcal{Y}$ , then we identify $\mathcal{Y}=H\times\mathcal{X}$ and the action becomes $g\cdot(f,X)=(gf,X)$ . Later we will write typed variables using a special bracket notation $[X,p]$ . For complexity issues we will only allow $\theta$ which satisfy $\left|\mathinner{\theta}\right|\in\mathcal{O}({\left|\mathinner{H}\right|}\left\|\mathinner{\mathcal{S}}\right\|^{2})$ where $\left\|\mathinner{\mathcal{S}}\right\|$ is specified in (2) below.

3.4. EDT0L languages and relations

The acronym EDT0L refers to Extended, Deterministic, Table, 0 interaction, and Lindenmayer. See the handbook [51] for the many results about L-systems. Let $A$ be an alphabet. A subset $L$ in a $k$ -fold direct product $A^{*}\times\cdots\times A^{*}$ is called a EDT0L relation if there is some (extended) alphabet $C$ with $d_{1},\ldots,d_{k}\in C$ such that $A\subseteq C$ and a rational set $\mathcal{R}\subseteq\operatorname{End}(C^{*})$ of endomorphisms over $C^{*}$ such that

[TABLE]

The classical situation refers to $k=1$ . In that case we speak about an EDT0L language; and our definition uses a characterization of EDT0L languages due to [1]. The connection is as follows. Let $\$$ be a symbol which is not in$ A $and$ L\subseteq A^{}\times\cdots\times A^{} $be a EDT0L relation, then$ \left{w_{1}$w_{2}\cdots$w_{k}\mathrel{\left|\vphantom{w_{1}$w_{2}\cdots$w_{k}}\vphantom{(w_{1},\ldots,w_{k})\in L}\right.}(w_{1},\ldots,w_{k})\in L\right} $is an EDT0L language in the usual sense over the alphabet$ A\cup\left{\mathinner{$}\right}$. It should also be noted that the class of EDT0L languages coincides with the class of HDT[math]L languages ([51, Thm. 2.6]).

We say $L$ is an effective EDT0L relation if there is an effective description of an NFA $\mathcal{A}$ with transitions labeled by “deterministic tables” of pairs $(c,u_{c})\in C\times C^{*}$ (encoding the endomorphism which maps $c$ to $u_{c}$ (and $\overline{c}$ to $\overline{u_{c}}$ ))333Without restriction we can assume each transition is labeled by an endomorphism which changes at most one pair of letters $c,\overline{c}$ . and letters $d_{1},\ldots,d_{k}\in C$ such that $(w_{1},\ldots,w_{k})\in L$ if and only if there is some $h\in L(\mathcal{A})\subseteq\operatorname{End}(C^{*})$ such that $(w_{1},\ldots,w_{k})=(h(d_{1}),\ldots,h(d_{k}))$ .

4. Twisted word equations

4.1. The initial setting

We begin with a nonempty alphabet of constants $A$ , and a list of $2k$ variables ${\mathcal{V}_{0}}$ (as always, both with involution) and a finite group $H$ where $H$ acts on $A$ via a homomorphism $\psi\colon H\to\operatorname{Aut}(A)$ . In particular, $|\psi(H)|\leqslant|A|!\leqslant|A|^{|A|}$ . As above, $H$ acts on $H\times{\mathcal{V}_{0}}$ by $f\cdot(g,X)=(fg,X)$ . For $w\in A^{*}$ and $f\in H$ we also use the notation $f(w)=(f,w)$ . Hence, we may represent elements in $(A\cup(H\times{\mathcal{V}_{0}}))^{*}$ by words in $(H\times(A^{*}\cup{\mathcal{V}_{0}}))^{*}$ . We abbreviate $(1,x)$ as $x$ for $x\in A^{*}\cup{\mathcal{V}_{0}}$ . By $\mu_{0}\colon A^{*}\to N$ we mean a homomorphism which respects the involution and the action of $H$ . Thus $A^{*}$ is, via $\mu_{0}$ , an $H$ - $N$ -monoid. Assume that $\mu_{0}$ has been extended to a mapping $\mu_{0}\colon A^{*}\cup{\mathcal{V}_{0}}\to N$ such that $\mu_{0}(\overline{X})=\overline{\mu_{0}(X)}$ , then $\mu_{0}$ extends to a morphism $\mu_{0}\colon(A\cup(H\times{\mathcal{V}_{0}}))^{*}\to N$ of $H$ - $N$ -monoids by $\mu_{0}(f,X)=f\cdot\mu_{0}(X)$ . Initially we work over free monoids.

Definition 4.1.

A system $\mathcal{S}$ of twisted word equations with regular constraints over $A$ and ${\mathcal{V}_{0}}$ is given by the following data:

•

A list of $2k$ variables such that ${\mathcal{V}_{0}}=\left\{\mathinner{X_{1},\overline{X_{1}},\ldots,X_{k},\overline{X_{k}}}\right\}$ .

•

The set of twisted variables becomes $\mathcal{Y}=H\times{\mathcal{V}_{0}}$ .

•

A set of pairs $\left\{(U_{i},V_{i})\mathrel{\left|\vphantom{(U_{i},V_{i})}\vphantom{1\leqslant i\leqslant s}\right.}1\leqslant i\leqslant s\right\}$ where $U_{i},V_{i}\in(A\cup\mathcal{Y})^{*}$ .

•

A morphism $\mu_{0}\colon(A\cup\mathcal{Y})^{*}\to N$ of $H$ -monoids, with $N$ finite.

A solution of $\mathcal{S}$ is a morphism of sets with involution $\sigma\colon{\mathcal{V}_{0}}\to A^{*}$ which is (uniquely) extended to an $A$ -morphism of $H$ - $N$ -monoids $\sigma\colon(A\cup\mathcal{Y})^{*}\to A^{*}$ such that

•

$\sigma(U_{i})=\sigma(V_{i})$ * for all pairs $(U_{i},V_{i})$ .*

•

$\mu_{0}\sigma(X)=\mu_{0}(X)$ * for all variables. Hence, $\mu_{0}\sigma=\mu_{0}$ .*

As usual, a pair $(U_{i},V_{i})$ representing a twisted equation is simply written as $U_{i}=V_{i}$ .

Example 4.2.

Let $A=\{a,\overline{a},b,\overline{b}\},{\mathcal{V}_{0}}=\{X,\overline{X},Y,\overline{Y},Z,\overline{Z}\}$ , $f,g\in H$ defined by $f(a)=b,g(a)=\overline{a},g(b)=b$ , $U_{1}=(f,X)a(g,\overline{Y})$ , $V_{1}=Z$ , $U_{2}=(f,Y)b$ , $V_{2}=\overline{a}b(g,X)$ , $U_{3}=Xa$ , $V_{3}=b(f,X)$ and (for simplicity) $\mu_{0}(x)=1$ for all $x\in A\cup{\mathcal{V}_{0}}$ . A pictorial representation of the example is shown in Fig. 2. The reader is invited to verify that one possible solution is $\sigma(X)=bab,\sigma(Y)=\overline{b}aa\overline{b},\sigma(Z)=abaabaab$ , and a second solution is $\sigma(X)=b,\sigma(Y)=\overline{b}a,\sigma(Z)=aaab$ .

If $\sigma$ is a solution of $\mathcal{S}$ we also say that $\sigma$ solves $\mathcal{S}$ . For $\left\{\mathinner{X_{1},\overline{X_{1}},\ldots,X_{k},\overline{X_{k}}}\right\}$ the full solution set $\mathrm{Sol}(\mathcal{S})$ of $\mathcal{S}$ is defined as

[TABLE]

4.2. The main result on twisted word equations

Our main result shows that $\mathrm{Sol}(\mathcal{S})$ is an EDT0L language, for which we can compute an effective description in polynomial space. In order to measure complexities accurately, we need a precise notion of input size. Let $\mathcal{S}$ be a system of twisted word equations with regular constraints over $A$ and ${\mathcal{V}_{0}}$ according to Def. 4.1.

We define the size of $\mathcal{S}$ using two parameters $\left\|\mathinner{\mathcal{S}}\right\|$ and $m(\mathcal{S})$ . Thus, the size is the pair $(\left\|\mathinner{\mathcal{S}}\right\|,m(\mathcal{S}))$ . The first parameter $\left\|\mathinner{\mathcal{S}}\right\|$ ignores the size of the finite monoid $N$ . It is the main parameter as we don’t want that the complexity due to constraints dominates the overall complexity. The second parameter measures separately the number of additional bits for handling the constraints. We begin by defining $\left\|\mathinner{\mathcal{S}}\right\|$ . Let

[TABLE]

Recall that $2k$ is the number of variables, $s$ is the number of equations $U_{i}=V_{i}$ , and $H$ denotes a finite group acting on $A$ and hence on $A^{*}$ , too. We are interested in a situation only where $A\neq\emptyset$ and $\left\|\mathinner{\mathcal{S}}\right\|>4$ .

The finite monoid $N$ is also part of the input. We measure the size relative to $\left\|\mathinner{\mathcal{S}}\right\|$ . Let $N$ be any finite $H$ -monoid and $m_{\mathcal{S}}(N)\in\mathbb{N}$ be a number such that elements of $N$ can be encoded by a number of bits which is at most

[TABLE]

Moreover, using this specification, monoid computations, like computing the involution, the multiplication of two elements, and the action by $H$ , can be performed on a Turing machine in space $m_{\mathcal{S}}(N)\left\|\mathinner{\mathcal{S}}\right\|\log\left\|\mathinner{\mathcal{S}}\right\|$ . We let

[TABLE]

There are examples where the monoid $N$ (which appears in $\mathcal{S}$ ) is polynomially bounded in $\left\|\mathinner{\mathcal{S}}\right\|$ and still $m(\mathcal{S})\in\mathcal{O}(1)$ . However, if $m(\mathcal{S})$ becomes a polynomial in $\left\|\mathinner{\mathcal{S}}\right\|$ , then we need to consider $m(\mathcal{S})$ separately for a finer analysis below $\mathsf{PSPACE}$ .

An $H$ -monoid $N$ is called small with respect to $\mathcal{S}$ if $m_{\mathcal{S}}(N)\in\mathcal{O}(1)$ . The finite monoid recognizing reduced words (those without factors $a\overline{a}$ for $a\in A$ ) is small with respect to $\mathcal{S}$ , see Ex. 2.1. Being small is no restriction for the computability of the representation of the full solution set as an EDT0L relation – we can always add trivial equations until $N$ becomes small with respect to $\mathcal{S}$ . Another example of a small monoid is the finite monoid $\mathop{\text{ST}}(H)$ of stabilizers. Its size depends on $A$ and $H$ , but still it is small due to Lem. 3.4.

Now, during the process we might wish to use direct products of (small) monoids. For that the parameter $m_{\mathcal{S}}$ behaves nicely:

[TABLE]

Indeed, given $(n_{1},n_{2})\in N_{1}\times N_{2}$ we can use the first $m_{\mathcal{S}}(N_{1})$ bits to encode $n_{1}$ and the last $m_{\mathcal{S}}(N_{2})$ bits to encode $n_{2}$ . The operations on $N_{1}\times N_{2}$ can be done component wise. In particular, a direct product of small monoids remains small.

We are ready to state our main result which gives $\mathsf{PSPACE}$ as an upper bound for the complexity and a quasi-quadratic space bound if $N$ is small.

Theorem 4.3.

There is an $\mathsf{NSPACE}(|H|\cdot\left\|\mathinner{\mathcal{S}}\right\|^{2}\cdot m(\mathcal{S})\cdot\log|A|\cdot\log\left\|\mathinner{\mathcal{S}}\right\|)$ algorithm which performs the following task. It takes as input a system of twisted word equations $\mathcal{S}$ with regular constraints. The system use a set of constants $A$ , a set of variables $\mathcal{V}_{0}=\left\{\mathinner{X_{1},\overline{X_{1}},\ldots,X_{k},\overline{X_{k}}}\right\}$ , and the regular constraint is given by a morphism ${\mu_{0}}\colon A\cup\mathcal{V}_{0}\to{N}$ . The output is:

•

an extended alphabet $C$ of size $\mathcal{O}(|H|^{2}\left\|\mathinner{\mathcal{S}}\right\|^{2})$ ;

•

distinguished letters $d_{i}\in C$ for each variable $X_{i}$ ;

•

a trim NFA $\mathcal{A}_{\mathcal{S}}$ accepting a rational set of $A$ -morphisms $L(\mathcal{A}_{\mathcal{S}})\subseteq\operatorname{End}(C^{*})$ such that

[TABLE]

The algorithm stores intermediate equations with a length bound in $\mathcal{O}(|H|\left\|\mathinner{\mathcal{S}}\right\|^{2})$ . Moreover, $\mathrm{Sol}(\mathcal{S})=\emptyset$ if and only if $\mathcal{A}_{\mathcal{S}}=\emptyset$ ; and $\left|\mathinner{\mathrm{Sol}(\mathcal{S})}\right|<\infty$ if and only if $\mathcal{A}_{\mathcal{S}}$ doesn’t contain any directed cycle.

Let us comment on the rather complicated space bound

[TABLE]

which appears in the statement of the theorem. First, since $\left|\mathinner{H}\right|\leqslant\left\|\mathinner{\mathcal{S}}\right\|$ and $\left|\mathinner{C}\right|\in\mathcal{O}(|H|^{2}\left\|\mathinner{\mathcal{S}}\right\|^{2})$ , we can encode all letters by $\mathcal{O}(\log\left\|\mathinner{\mathcal{S}}\right\|)$ bits. Second, the $\mu$ -value for the constraints changes dynamically: it is a priori not fixed for the extended alphabet $C$ . So, it is enough to store the $\mu$ -value for each symbol which appears in intermediate equations. The length bound on intermediate equations is in $\mathcal{O}(|H|\left\|\mathinner{\mathcal{S}}\right\|^{2})$ . Each $\mu$ -value is an element in $N$ , which requires, by definition, $m(\mathcal{S})\cdot\log|A|\cdot\log\left\|\mathinner{\mathcal{S}}\right\|$ bits for the encoding. Together, we need $\mathcal{O}(|H|\cdot\left\|\mathinner{\mathcal{S}}\right\|^{2}\cdot m(\mathcal{S})\cdot\log|A|\cdot\log\left\|\mathinner{\mathcal{S}}\right\|)$ bits to encode intermediate equations,

Corollary 4.4.

Let $\mathcal{S}$ be a system of twisted word equations with regular constraints in variables $\left\{\mathinner{X_{1},\overline{X_{1}},\ldots,X_{k},\overline{X_{k}}}\right\}$ . Then $\mathrm{Sol}(\mathcal{S})\subseteq A^{*}\times\cdots\times A^{*}$ is an effective EDT0L relation.

Proof.

This is a formal consequence of Thm. 4.3. ∎

Sections 5 through 12 are devoted to the proof of Thm. 4.3. The theorem implies that we can decide in $\mathsf{PSPACE}$ (hence, in deterministic singly exponential time) whether $\mathcal{S}$ is solvable and whether or not there are only finitely many solutions. The decision problem of whether a word equation with regular constraints has a solution is known to be $\mathsf{PSPACE}$ -hard by [31] because the intersection problem of regular languages is a special case. In our setting, if the finite monoid $N$ is small, then the best known lower bound to date is $\mathsf{NP}$ -hardness: it is the lower bound for deciding whether or not a linear Diophantine system over $\mathbb{N}$ has a solution [24].

5. Preparation

We begin the proof of Thm. 4.3 with some technical preparations. Sections 5.1–5.3 concern some reductions, and could easily be skipped in a first reading of the paper. These sections yield a reduction to the situation as stated in beginning of Sect. 5.4. We invite the reader to jump directly to Sect. 5.4 and to read the parts in between only when necessary.

5.1. Reducing to faithful actions

Recall that the action of $H$ on $A$ is given by a homomorphism $\psi\colon H\to\operatorname{Aut}(A)$ . We don’t require that $\psi$ is injective because in some natural examples this is not the case, see Sect. 15. On the other hand it is enough to prove Thm. 4.3 in the case where $H$ is actually a subgroup of $\operatorname{Aut}(A)$ . Let us show how the reduction works. The principal idea is to replace $H$ by $H/K$ where $K=\ker(\psi)$ is the kernel of $\psi$ .

If $M$ is any $H$ -monoid, then the action of $H$ induces an action of $H/K$ on $M$ only if for all $f\in K$ and all $m\in M$ we have $f(m)=m$ . In this case $M$ becomes an $H/K$ monoid: the action $g\cdot K(m)=g(m)$ is well-defined for all $g\cdot K\in H/K$ . By definition of $K$ , the free monoid $A^{*}$ is therefore an $H/K$ -monoid. Inspecting the statement in Thm. 4.3, there are two problems: the induced action of $K$ on the finite monoid is not trivial, in general. Moreover, the group acts $H$ freely on the set of variables $H\times\mathcal{X}_{0}$ , so there is no induced action of $H/K$ on this set unless $K$ is trivial. We address and solve both problems.

Let us begin with the $H$ -monoid $N$ , then it has a largest $H$ -invariant submonoid $N^{\prime}$ where $K$ acts trivially. It is the submonoid of $K$ -invariant elements:

[TABLE]

The image $\mu_{0}(A^{*})$ is a submonoid of $N^{\prime}$ , since for all $f\in K$ and all $w\in A^{*}$ we have $f(\mu_{0}(w))=\mu_{0}(f(w))=\mu_{0}(w)$ . However, the statement in Thm. 4.3 doesn’t require that $\mu_{0}(X)$ takes values in $N^{\prime}$ . Let us show that $\mathcal{S}$ is not solvable if there is some variable $X$ such that $\mu_{0}(X)\notin N^{\prime}$ . Indeed, assume the contrary that there is a solution $\sigma\colon\mathcal{X}_{0}\to A^{*}$ such that $\mu_{0}(X)\notin N^{\prime}$ . Then there is some $f\in K$ such that

[TABLE]

which is a contradiction. We have $f(\sigma(X))=\sigma(X)$ because $K$ acts trivially on $A^{*}$ .

Thus, as a first procedure in the proof of Thm. 4.3 we check that $\mu_{0}(X)\in N^{\prime}$ for all $X\in\mathcal{X}_{0}$ (and therefore $\mu_{0}(f,X)\in N^{\prime}$ for all $(f,X)\in H\times\mathcal{X}_{0}$ ). The test runs over all $f\in H$ and for each $f$ checks the following implication:

[TABLE]

If the check is positive then $\mu_{0}(X)\in N^{\prime}$ for all $X\in\mathcal{X}_{0}$ . If the check fails then we output that $\mathcal{S}$ is not solvable by defining $\mathcal{A}_{\mathcal{S}}=\emptyset$ . We can perform the test within our given space bound by definition of $m(\mathcal{S})$ .

After the test, we may assume that $\mu_{0}$ maps $A\cup(H\times\mathcal{X}_{0})$ to $N^{\prime}$ , and we replace $N$ by $N^{\prime}$ . We can use the same bit encoding for elements in $N^{\prime}$ as we did for $N$ , but if we have to guess an element in $N^{\prime}$ , we perform the test (6). Thus, the parameter $m(\Phi)$ is still valid.

In the second step we replace each variable $(f,X)\in H\times\mathcal{X}_{0}$ by a fresh variable $(f\cdot K,X)\in(H/K)\times\mathcal{X}_{0}$ . Again, this doesn’t change $\mathrm{Sol}(\mathcal{S})$ since for every $H$ -compatible morphism $\sigma\colon H\times\mathcal{X}_{0}\to A^{*}$ and all $f\in K$ we have

[TABLE]

We are done. We have shown the following statement.

Lemma 5.1.

It is enough to prove Thm. 4.3 under the additional assumption that $\psi$ is injective. This means we can assume that $H$ is a subgroup of $\operatorname{Aut}(A)$ .

5.2. Making the finite monoid $N$ larger

The aim in this section is to replace $N$ by a larger monoid, which additionally encodes information about stabilizers for all $x\in A\cup\mathcal{X}_{0}$ . Up to a constant factor we don’t change $m(\mathcal{S})$ .

Let $\mathop{\text{ST}}(H)$ be the monoid of stabilizers, see Sect. 3.1. We define a morphism $\mu_{1}\colon A\cup\mathcal{X}_{0}\to N\times\mathop{\text{ST}}(H)$ which maps a letter $a\in A$ to $(\mu_{0}(a),H_{a})$ and each variable $X$ to some $(\mu_{0}(X),H_{u})$ where $u\in A^{*}$ is any word of length at most $\log_{2}|H|$ by guessing $u$ . The $H$ -action on $N\times\mathop{\text{ST}}(H)$ is inherited from the action on $N$ and the action on $\mathop{\text{ST}}(H)$ by conjugation. Moreover, guessing is equivalent to taking the union over finitely many cases, see (7). The union will give the same solutions we had before, and it will not introduce any new solutions.

The projection to the first component turns $N\times\mathop{\text{ST}}(H)$ into an $H$ - $N$ -monoid. Using $\mu_{1}$ we achieve the following:

•

for all $x\in(A\cup\mathcal{X}_{0})^{*}$ , $\mu_{1}(x)\in N\times\mathop{\text{ST}}(H)$ is a pair where the second component is $H_{x}$ which is represented by a word $u\in A^{*}$ of length at most $\log|H|$ such that $H_{x}=H_{u}$ .

The switch to $\mu_{1}$ has a price. By defining $\mu_{1}(X)$ we restrict the set of possible solutions. The value $\mu_{1}(X)=(\mu_{0}(X),H_{u})$ fixes the stabilizer $H_{\sigma(X)}$ for a solution $\sigma$ to be the subgroup $H_{u}$ . The number of choices (= nondeterministic guesses) to extend $\mu_{0}$ to $\mu_{1}$ is bounded by

[TABLE]

These choices result in a splitting the original system into that many subsystems. Splitting is fortunately no problem since EDT0L languages are closed under finite union by taking the unions of the corresponding NFAs.

At the end of this we rename $N\times\mathop{\text{ST}}(H)$ as $N$ and $\mu_{1}$ as $\mu_{0}\colon A\cup\mathcal{X}_{0}\to N$ .

5.3. Introducing a zero to $N$ and a marker symbol to $A$ .

In the following it is convenient to have a special symbol $\#$ , but we want to make sure no variable uses it, so we add [math] to our constraint monoid. We next embed our current $N$ into $N\cup\left\{\mathinner{0}\right\}$ where [math] is a fresh symbol not included in $N$ and [math] acts as a zero in $N\cup\left\{\mathinner{0}\right\}$ . We turn it into an $H$ -monoid by defining $f(0)=0$ for all $f\in H$ .

The monoid $N$ is an $H$ -submonoid of $N\cup\left\{\mathinner{0}\right\}$ and, by a slight abuse of language, we denote by $\mu_{0}$ the induced mapping to the larger monoid $N\cup\left\{\mathinner{0}\right\}$ as well:xt

[TABLE]

Without restriction (by adjusting constants if necessary) we may assume that $N\cup\left\{\mathinner{0}\right\}$ doesn’t change the parameter $m(\mathcal{S})$ . Using $N\cup\left\{\mathinner{0}\right\}$ instead of $N$ doesn’t change $\mathrm{Sol}(\mathcal{S})$ because $\mu_{0}(A\cup(H\times\mathcal{X}_{0}))\subseteq N$ . Phrased differently, without restriction $N$ has a zero [math]; and $\mu_{0}(A\cup(H\times\mathcal{X}_{0}))\subseteq N\setminus\left\{\mathinner{0}\right\}$ .

At this point we to add the special symbol $\#$ to $A$ . We let $\mu_{0}(\#)=0$ , $\overline{\#}=\#$ and $f(\#)=\#$ for all $f\in H$ . So, from now on we assume that $\#\in A$ . Since we did not change $\mu_{0}(X)$ for any variable $X$ we are sure that for every solution $\sigma$ to $\mathcal{S}$ and every variable $X$ we have $|\sigma(X)|_{\#}=0$ : the marker cannot appear in any solution.

5.4. Triangular systems

Due to the preceding subsections we henceforth make the following assumptions:

•

$H$ is a subgroup of $\operatorname{Aut}(A)$ .

•

There is some $\#\in A$ with $\overline{\#}=\#$ and $f(\#)=\#$ for all $f\in H$ .

•

The $H$ -monoid $N$ contains a zero [math] and for all $x\in A\cup\mathcal{X}_{0}$ we have

[TABLE]

•

$\mu_{0}(x)$ is a pair where the second component is the stabilizer $H_{x}$ which is represented by a word $u\in A^{*}$ of length at most $\log|H|$ such that $H_{x}=H_{u}$ for all $x\in(A\cup\mathcal{X}_{0})^{*}$ . Since $H_{x}=H_{u}$ and $u$ is in $A^{*}$ we have for all $f\in H$ :

[TABLE]

Definition 5.2.

A twisted word equation $U=V$ is called triangular if $U$ contains at most two variables and $V$ at most one variable.

Following well-known methods (see for example [8]) we enlarge the set of variables $\mathcal{X}_{0}$ to a larger set $\mathcal{X}\supset\mathcal{X}_{0}$ (using at most $2\left\|\mathinner{\mathcal{S}}\right\|$ more variables) such that every equation becomes triangular in a more specific form: every equation has the form either $Z=1$ or $U=Z$ where $\left|\mathinner{U}\right|=2$ and in both cases $Z$ is a variable.

It therefore is enough to show Thm. 4.3 in the case where each each equation $U_{i}=V_{i}$ equals $(f,x)(g,y)=(h,Z)$ where $x,y\in A\cup\mathcal{X}$ and $Z\in\mathcal{X}$ . Moreover, since $(f,x)(g,y)=(h,Z)$ is equivalent to $({h}^{-1}f,x)({h}^{-1}g,y)=Z$ we can restrict ourselves to the case that each $U_{i}=V_{i}$ is of the form $(f,x)(g,y)=Z$ . Due to additional variables, we work over a set of variables

[TABLE]

where $k\leqslant k^{\prime}\leqslant 2\left\|\mathinner{\mathcal{S}}\right\|$ and the first $2k$ variables belong to the original system.

Hence, the starting point is a system of equations $(f,x)(g,y)=Z$ . The number of these triangular equations is at most $2\left\|\mathinner{\mathcal{S}}\right\|$ , so we can ignore this blow-up. During the process we need a more general form, nontrivial triangular equations appear as $u(f,x)w(g,y)v=u^{\prime}Zv^{\prime}$ where $u,w,v,u^{\prime},v^{\prime}$ are words over constants. Whenever such an equation with $\left|\mathinner{u}\right|=\left|\mathinner{u^{\prime}}\right|=\left|\mathinner{v}\right|=\left|\mathinner{v^{\prime}}\right|$ appears, then necessarily $u={u^{\prime}}$ and $v=v^{\prime}$ ; otherwise the equation is “unsolvable”. That is, in a nondeterministic implementation of our process, this branch never leads to an accepting state. In an implementation of the algorithm we would reject the branch immediately.

Finally, it is somewhat convenient to assume $\left|\mathinner{A}\right|+\left|\mathinner{\mathcal{X}}\right|\leqslant|UV|$ . We may achieve this for example by adding some dummy equations, and then $\left\|\mathinner{\mathcal{S}}\right\|\in\mathcal{O}(\left|\mathinner{H}\right|+\left|\mathinner{UV}\right|)$ .

5.5. Fixing more notation

During the process we enlarge the sets of constants and variables. We begin with two disjoint infinite alphabets with involution $C$ and $\Omega$ and $\Sigma=C\cup\Omega$ . All constants are drawn from $C$ and all variables are drawn from $\Omega$ . We never write down all elements from $C$ or $\Omega$ , just certain subsets which are needed in a specific situation. Later we will choose $\Sigma$ such that $\left|\mathinner{\Sigma}\right|\in\mathcal{O}(\left|\mathinner{H}\right|^{2}{\left\|\mathinner{\mathcal{S}}\right\|}^{2})$ , but initially for our infinite automata $\mathcal{T}$ and $\mathcal{F}$ we do not impose any size restrictions.

Throughout we use following conventions and notation.

•

There are $2k$ distinguished letters $\left\{\mathinner{d_{1},\overline{d_{1}},\ldots,d_{k},\overline{d_{k}}}\right\}$ which appear in Thm. 4.3.

•

$\#\in A\subseteq B=\overline{B}=\left\{f(b)\mathrel{\left|\vphantom{f(b)}\vphantom{b\in B}\right.}b\in B\right\}\subseteq C$ .

•

$B\cap\left\{\mathinner{d_{1},\overline{d_{1}},\ldots,d_{k},\overline{d_{k}}}\right\}=\emptyset$ unless we are at a final state (to be defined below, see Sect. 7.1.2).

•

$\mathcal{Y}=\overline{\mathcal{Y}}=\left\{f(X)\mathrel{\left|\vphantom{f(X)}\vphantom{X\in\mathcal{X},f\in H}\right.}X\in\mathcal{X},f\in H\right\}\subseteq\Omega$ , and $X\neq\overline{X}$ for all $X\in\Omega$ . If $H$ acts freely on $\mathcal{Y}$ , then we write $\mathcal{Y}=H\times\mathcal{X}$ , too. We view $\mathcal{X}\subseteq\mathcal{Y}$ .

•

The action of $H$ and the involution on $\Sigma$ extend those on $A\cup\mathcal{Y}$ .

•

$\mu\colon(B\cup\mathcal{Y})^{*}\to N$ satisfies $\mu(a)=\mu_{0}(a)$ for $a\in A$ .

•

$a,b,c,[p],[r,s,\lambda],\ldots$ refer to letters in $C$ .

•

$u,v,w,\ldots$ refer to words in $C^{*}$ .

•

$X,Y,Z,[X,p],\ldots$ refer to variables in $\Omega$ .

•

$x,y,z,\ldots$ refer to words in $\Sigma^{*}$ .

These conventions hold everywhere unless explicitly stated otherwise. They also apply to primed symbols such as $B^{\prime}$ , $\mathcal{X}^{\prime}$ etc. Throughout we also use the following.

Remark 5.3.

If we know $\mu(x)\in N$ for any $x\in(B\cup\mathcal{Y})^{*}$ , then we also know, by the representation of the second component in $\mu(x)$ , a word $u\in A^{*}$ of length at most $\log|H|$ such that $H_{x}=H_{u}$ . This enables for all $f\in H$ an efficient test to check whether $f(x)=x$ , see (8) above. Moreover, if we have $z=xy$ with $x,y,z\in(B\cup\mathcal{Y})^{*}$ and we have to calculate $\mu(z)$ as the product $\mu(x)\mu(y)$ , then we need to find a word $w\in A^{*}$ of length at most $\log|H|$ such that $H_{z}=H_{x}\cup H_{y}=H_{w}$ . We may assume that $H_{x}$ and $H_{y}$ are already given as $H_{x}=H_{u}$ and $H_{y}=H_{v}$ where $uv\in A^{*}$ of length at most $2\log|H|$ . In order to compute $w$ we run the algorithm from the proof of Lem. 3.4.

5.6. The initial word equation $W_{\mathrm{init}}$

For technical reasons we encode the initial (triangular) system $\left\{(U_{i},V_{i})\mathrel{\left|\vphantom{(U_{i},V_{i})}\vphantom{1\leqslant i\leqslant s}\right.}1\leqslant i\leqslant s\right\}$ of twisted equations in variables $\mathcal{X}$ where $\left\{X_{i}\mathrel{\left|\vphantom{X_{i}}\vphantom{1\leqslant i\leqslant{|\mathcal{X}|/2}}\right.}1\leqslant i\leqslant{|\mathcal{X}|/2}\right\}\subseteq\mathcal{X}=\overline{\mathcal{X}}$ as a single word. Let $U=U_{1}\#U_{2}\cdots\#U_{s}$ and $V=V_{1}\#V_{2}\cdots\#V_{s}$ .

The initial equation $W_{\mathrm{init}}\in(A\cup(H\times\mathcal{X}))^{*}$ is defined as:

[TABLE]

In particular, each $X\in\mathcal{X}$ appears in $W_{\mathrm{init}}$ . Here:

[TABLE]

Note that $\sigma(W)=\sigma(\overline{W})$ if and only if $\sigma(U_{i})=\sigma(V_{i})$ for all $i$ .

5.7. Fixing the parameters $n$ , $\varepsilon$ , and $\delta$

Having defined $W_{\mathrm{init}}$ we fix the following parameters $n,\varepsilon,\delta\in\mathbb{N}$ by

[TABLE]

By our assumptions this implies $n>|A|+|\mathcal{X}|$ . We have $\left\|\mathinner{\mathcal{S}}\right\|\in\left|\mathinner{H}\right|+\Theta(n)$ . Moreover, since $n\in\mathcal{O}(\left\|\mathinner{\mathcal{S}}\right\|)$ , we have $\delta\in\mathcal{O}(\left|\mathinner{H}\right|\left\|\mathinner{\mathcal{S}}\right\|)$ and $\varepsilon\in\mathcal{O}(\left\|\mathinner{\mathcal{S}}\right\|)$ . As a consequence:

[TABLE]

Note that the right-hand side in (11) coincides with the space bound we allow to store intermediate equations according to Thm. 4.3.

5.8. Extended equations and their solutions

The NFA over the monoid $\operatorname{End}(C^{*})$ we will construct uses extended equations as states. The overall strategy is to remove variables from the equation until no variables remain. During the process we will enter a phase called $\delta$ -periodic compression repeatedly (which is the analogue of “block compression” for solving word equations in free groups). During each call of $\delta$ -periodic compression, each variable may create temporarily two new variables which will vanish before the end of that call. So at most $3\left|\mathinner{H}\right|n$ variables are needed, including these additional (temporary) variables.

Definition 5.4.

An extended equation is a tuple $E=(W,B,\mathcal{X},\theta,\mu)$ , where $A\subseteq B$ and $M(B,\mathcal{X},\theta,\mu)$ is an $H$ - $N$ -monoid with type $\theta$ . Moreover, we require:

(1)

$W\in M(B,\mathcal{X},\theta,\mu)$ * which can be written as a word in the form:*

[TABLE]

with $x_{i}$ , $u_{j}$ , $v_{k}\in(B\cup\mathcal{Y})^{*}$ and $\mu(x_{i})\neq 0,\mu(u_{j})\neq 0,\mu(v_{k})\neq 0$ . 2. (2)

Given $W$ as above we call $u_{i}=v_{i}$ a local equation. 3. (3)

$|W|_{\#}=|W_{\mathrm{init}}|_{\#}$ . 4. (4)

For every $X\in\mathcal{X}$ there exists some $f\in H$ such that $f(X)$ appears in $W$ . 5. (5)

We say $E$ is a standard state if first, $\theta=\emptyset$ , and second, all local equations are triangular. 6. (6)

If $E=(W,B,\mathcal{X},\emptyset,\mu)$ is a standard state, then $\mathcal{X}\subseteq\mathcal{X}$ . Moreover,

[TABLE] 7. (7)

If $E=(W,B,\mathcal{X},\theta,\mu)$ is any state, then $\sum_{X\in\mathcal{X}}|W|_{X}\leqslant 3n.$ (Thus, we can bound $\Omega$ by $\left|\mathinner{\Omega}\right|\leqslant 3|H|n$ right away.) 8. (8)

If variables $X,Y$ are typed with $X\neq Y$ and $(Xa,aY)\in\theta$ , then we have $\theta(X)a=a\theta(Y)$ in the submonoid $M(B,\theta,\mu)$ generated by $B$ .

Recall (Sect. 3.3) that if a variable $X$ is typed, then there is a primitive word $p=\theta(X)$ such that $(Xp,pX)\in\theta$ .

Definition 5.5.

Let $E=(W,B,\mathcal{X},\theta,\mu)$ be an extended equation.

•

A solution is a $B$ -morphism $\sigma\colon M(B,\mathcal{X},\theta,\mu)\to M(B,\theta,\mu)$ such that:

–

$\sigma(W)=\sigma(\overline{W}).$ **

–

$\sigma(X)\in p^{*}$ , whenever $X$ is typed and $p=\theta(X)$ .

•

An entire solution is a pair $(\alpha,\sigma)$ where $\alpha\colon M(B,\theta,\mu)\to M(A,\emptyset,\mu_{0})$ is an $A$ -morphism and $\sigma$ is a solution.

6. Twisted conjugacy and $\delta$ -periodic words

A key step in proving Thm. 4.3 is to solve a particular kind of a twisted equation: conjugacy. Let $x,y,z\in A^{*}$ . An easy exercise in combinatorics on words [23] yields:

[TABLE]

This fact is crucial in Makanin’s classical approach [36] to solve (untwisted) word equations. Here, we need a variant of (12) in the twisted environment. We say that words $x,y\in{A}^{*}$ are twisted conjugate if there are $f,g,h\in H$ and $z\in{A}^{*}$ such that $zg(y)=h(x)f(z)$ . We also say that $\left|\mathinner{x}\right|=\left|\mathinner{y}\right|$ is the offset of the conjugacy. A twisted conjugacy equation is a (non-triangular) twisted equation of the form

[TABLE]

Proposition 6.1.

Let $\sigma$ be a solution of the twisted equation (13) such that the offset $\left|\mathinner{\sigma(X)}\right|$ satisfies $1\leqslant\left|\mathinner{\sigma(X)}\right|<\left|\mathinner{\sigma(Z)}\right|$ . Then there are words $r\in{A}^{+}$ , $s\in{A}^{*}$ and $e,j\in\mathbb{N}$ with $0\leqslant j<\left|\mathinner{H}\right|$ such that $|rs|=\sigma(X)$ and

[TABLE]

Proof.

Let $v=\sigma(X)$ and $u=h(v)$ . Since $1\leqslant\left|\mathinner{\sigma(X)}\right|<\left|\mathinner{\sigma(Z)}\right|$ the word $u$ is a proper nonempty prefix of $\sigma(Z)$ . If $2\left|\mathinner{u}\right|\leqslant\left|\mathinner{\sigma(Z)}\right|$ , then $uf(u)$ is a prefix of $\sigma(Z)$ , and so on. Thus, $\sigma(Z)$ is a prefix of a word $uf(u)f^{2}(u)\cdots f^{k}(u)$ for some $k\in\mathbb{N}$ . Next, observe that $f^{\left|\mathinner{H}\right|}(u)=f^{0}(u)=u$ for every word $u\in{A}^{*}$ . Thus,

[TABLE]

where $0\leqslant j<\left|\mathinner{H}\right|$ , $u=rs$ and the $|r|$ suffix of $Z$ is where the pattern runs out, as illustrated in Fig. 3. We then have $\sigma(Y)=g^{-1}f^{j}(sf(r))$ . Hence, the nonempty word $u$ and the length $\left|\mathinner{\sigma(Z)}\right|$ define a unique factorization $u=rs$ , integers $0\leqslant e$ and $0\leqslant j<\left|\mathinner{H}\right|$ such that $\sigma(Z)$ has the desired form above. ∎

A word $p$ is called primitive if it cannot be written as $p=r^{e}$ with $e\geqslant 2$ . In particular, the empty word $1$ is not primitive. It is well known (and easy to see) that a nonempty word $p$ is if and only if $p^{2}$ cannot be written as $p^{2}=xpy$ with $x\neq 1$ and $y\neq 1$ .

Let $w,p\in{A}^{+}$ be nonempty words. We say that $w$ has period $|p|$ if $w$ is a prefix of $p^{|w|}$ . In other words, if $w=a_{1}\cdots a_{n}$ with $a\in{A}$ , then $a_{i}=a_{i+|p|}$ for all $1\leqslant i\leqslant n-|p|$ . A word may have several periods, for example $w=aabaabaa$ has periods $3,6,7,8$ . If $|p|$ is the least period of $w$ , then $|p|\leqslant|w|$ and we can choose $p$ to be primitive such that $p\leqslant w$ . For example, $aab\leqslant aabaabaa$ is a primitive prefix and $|aab|=3$ .

Corollary 6.2.

Let $\varepsilon\in\mathbb{N}$ , $f,g,h\in H$ , and $x,y,z\in A^{*}$ be words with $1\leqslant|x|\leqslant\varepsilon$ and $|z|\geqslant\left|\mathinner{H}\right|\varepsilon$ . If we have $zg(y)=h(x)f(z)$ , then $z$ has a period of at most $\left|\mathinner{H}\right|\varepsilon$ .

Moreover, let $z=\alpha w\beta$ be any factorization with $|w|=|x|$ . Then every letter $b$ occurring in $z$ satisfies $b=f(a)$ for some $f\in H$ and some letter $a$ occurring in $w$ .

Proof.

By Prop. 6.1 we have

[TABLE]

where $|f^{i}(rs)|=|x|\leqslant\varepsilon$ for all $i\geqslant 0$ . Hence, $z$ has a period

[TABLE]

For the second claim, if $z=\alpha w\beta$ with $|w|=|x|=|rs|$ then $w$ is a factor of $f^{i}(rs)f^{i+1}(rs)$ for some $i\geqslant 0$ . If we write $rs=a_{1}\dots a_{|x|}$ , then any letter $b$ in $z$ satisfies $b=f^{j}(a_{\ell})$ . Let $\iota\in\{i,i+1\}$ so that $f^{\iota}(a_{\ell})$ is a letter in $w$ , then $b=f^{j}(a_{\ell})=f^{j}(f^{-\iota}(f^{\iota}(a_{\ell})))=f^{j-\iota}(a)$ for some $j\geqslant 0$ . ∎

Definition 6.3.

We say that a word $w$ is $\delta$ -periodic if it has some period of length at most $\delta$ . A $\delta$ -periodic word $w$ is called long $\delta$ -periodic if $\left|\mathinner{w}\right|\geqslant 3\delta$ , and very long $\delta$ -periodic if $\left|\mathinner{w}\right|\geqslant 10\delta$ .

For example, $aabaaabaaab$ is 4-periodic but not long 4-periodic. An important property of $\delta$ -periodic words is the following.

Lemma 6.4.

Let $w$ be a $\delta$ -periodic word and $w=p^{e}r=q^{f}s$ such that $p,q$ are primitive $|p|\leqslant|q|\leqslant\delta$ , $1\neq r\leqslant p$ , $1\neq s\leqslant q$ , and $\left|\mathinner{w}\right|\geqslant 2\delta$ . Then $p=q$ , $e=f\geqslant 1$ , and $r=s$ .

Proof.

The assertion is clear for $|p|=|q|$ . Hence we may assume that $p$ is a proper prefix of $q$ . Since $q\leqslant w$ we conclude $q\leqslant p^{\delta}$ . Since $\left|\mathinner{w}\right|\geqslant 2\delta$ , and $\left|\mathinner{p}\right|\leqslant\left|\mathinner{q}\right|\leqslant\delta$ we see $pq\leqslant w\leqslant q^{\left|\mathinner{w}\right|}$ . Thus $q$ occurs as a factor inside $qq$ : we have $pqs=qq$ for some $s$ . Since $1\leqslant\left|\mathinner{p}\right|<\left|\mathinner{q}\right|$ , this contradicts the primitivity of $q$ . ∎

Let $u$ be a prefix (resp. factor, resp. suffix) of some nonempty word $w$ . We say that $u$ is a maximal $\delta$ -periodic prefix (resp. factor, resp. suffix) in $w$ if we cannot extend the occurrence of the factor $u$ inside $w$ by any letter to the right or left, to see a $\delta$ -periodic word.

7. The ambient infinite automaton $\mathcal{T}$

The states of the NFA $\mathcal{A}_{\mathcal{S}}$ (we are aiming for in Thm. 4.3) are extended equations and transitions are certain labeled arcs between states which modify the extended equations. Before we construct $\mathcal{A}_{\mathcal{S}}$ let us define an infinite automaton $\mathcal{T}$ . It will contain $\mathcal{A}_{\mathcal{S}}$ as a finite subautomaton. We show that $\mathcal{T}$ is sound: this means in the notation of Thm. 4.3

[TABLE]

This implies that all subautomata of $\mathcal{T}$ are sound, too. The set of states in $\mathcal{T}$ is the set of extended equations according to Def. 5.4, see Sect. 7.1. There are two kinds of transitions: a substitution transition transforms the variables; a compression transition affects the constants, but not the variables, see Sect. 7.2.

If $(W,B,\mathcal{X},\theta,\mu)\overset{h}{\longrightarrow}(W^{\prime},B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})$ is a transition, then its label $h$ is a morphism $h\colon M(B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})\to M(B,\mathcal{X},\theta,\mu)$ (in the opposite direction of the arc) which is specified by a mapping $h\colon\Delta\to B^{*}$ where $\Delta\subseteq B^{\prime}$ is some subset (possibly empty) of constants with $\Delta\cap A=\emptyset$ . We assume that such a map $h$ extends to a $A\cup\mathcal{X}^{\prime}$ -morphism $h\colon M(B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})\to M(B,\mathcal{X},\theta,\mu)$ by leaving all letters in $(B^{\prime}\cup\mathcal{X}^{\prime})\setminus\left\{f(d)\mathrel{\left|\vphantom{f(d)}\vphantom{d\in\Delta\cup\overline{\Delta},f\in H}\right.}d\in\Delta\cup\overline{\Delta},f\in H\right\}$ invariant. Since $h(\Delta)\subseteq B^{\prime*}$ , the restriction of $h$ also defines a morphism $h\colon M(B^{\prime},\theta^{\prime},\mu^{\prime})\to M(B,\theta,\mu)$ . Note that we use the same letter $h$ for both morphisms. There will be no risk of confusion.

Since $B^{\prime}\subseteq C$ , the morphism $h$ also induces an endomorphism of $C^{*}$ which respects the involution assuming $h(c)=c$ for all $c\in C\setminus B^{\prime}$ . However, outside $B^{\prime}$ neither the action of $H$ nor the value of $\mu$ is defined, so $C^{*}$ is not an $H$ - $N$ -monoid. It is simply a free monoid with involution, and we can read the label always as an endomorphism of the free monoid with involution $C^{*}$ .

New constants appear only by compression. If a word $w$ is replaced a letter $c$ by specifying $h(c)=w$ , then we will automatically set $\mu(c)=\mu(w)$ , $h(\overline{c})=\overline{w}$ , $h(f(c))=f(h(c))$ , and hence: $f(c)=c\iff f(w)=w$ for all $f\in H$ . By definition of $N$ , the second component in $\mu(w)$ is a word $u\in A^{*}$ of length at most $\log{\left|\mathinner{H}\right|}$ such that the stabilizer $H_{w}$ satisfies $H_{c}=H_{w}=H_{u}$ . In particular, we have an efficient test whether $f(c)=c$ for all $f\in H$ : we just check that $f(a)=a$ for all letters $a$ which appear in the word $u$ . The crucial observation is that whenever

[TABLE]

is a labeled path and $w\in B_{t}^{*}$ is word, then $h=h_{s+1}\cdots h_{t}$ can be viewed either as a morphism $h\colon M(B_{t},\theta_{t},\mu_{t})\to M(B_{s},\theta_{s},\mu_{s})$ or as an endomorphism of $C^{*}$ . If we have $w\in B_{t}^{*}$ , then $h$ defines a word $h(w)\in B_{s}^{*}$ and the corresponding element $h(w)\in M(B_{s},\theta_{s},\mu_{s})$ . By $\varepsilon$ we denote the identity endomorphism on $C^{*}$ . Then $\varepsilon$ appears as the label of transitions $(W,B,\mathcal{X},\theta,\mu)\overset{\varepsilon}{\longrightarrow}(W^{\prime},B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})$ where $h\colon M(B^{\prime},\theta^{\prime},\mu^{\prime})\to M(B,\theta,\mu)$ is a morphism with $h(a)=a$ for all $a\in B^{\prime}$ . For example, the label $\varepsilon$ might appear when $B^{\prime}\subseteq B$ or $\theta^{\prime}\subseteq\theta$ , etc.

7.1. States

We define the states of the $\mathcal{T}$ as the set of extended equations according to Def. 5.4. Thus, every state $E$ is of the form $E=(W,B,\mathcal{X},\theta,\mu)$ .

7.1.1. Initial state.

The initial state is $E_{\mathrm{init}}=(W_{\mathrm{init}},A,\mathcal{X},\emptyset,\mu_{0})$ .

7.1.2. Final states.

A state $(W,B,\emptyset,\emptyset,\mu)$ is final if

(1)

$W=\overline{W}$ and uses no variables. 2. (2)

The word $W$ has a prefix of the form $\#d_{1}\#\cdots\#d_{k}\#$ where $d_{i}$ are the distinguished letters mentioned in Thm. 4.3.

7.2. Transitions

We denote a transition as as $E\overset{h}{\longrightarrow}E^{\prime}$ and for both kinds, substitutions and compressions, we put some additional length restrictions on $h$ . For example, we allow $h(c)=1$ for a letter $c$ only if $E^{\prime}$ is a final state. Thus labels on paths not ending in a final state are never length decreasing morphisms. Moreover, we require that $h(c)$ is not too long. If $h$ is specified by a set $\Delta^{\prime}$ , then we require $\sum_{c\in\Delta^{\prime}}|h(c)|<|W|$ where $E=(W,B,\mathcal{X},\theta,\mu)$ . These length restrictions are not used in the proof of the soundness result Prop. 7.5. We need them when proving completeness for a finite subautomaton of $\mathcal{T}$ .

7.2.1. Substitution transitions

Definition 7.1.

A substitution transition is denoted as

[TABLE]

We must have $B\subseteq B^{\prime}$ and we require that the transition is defined by a $B$ -morphism $\tau\colon M(B,\mathcal{X},\theta,\mu)\to M(B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})$ and a $B$ -morphism $h\colon M(B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})\to M(B,\mathcal{X},\theta,\mu)$ such that $|h(b)|=1$ for all $b\in B^{\prime}$ . In particular, $h$ is length preserving.

In the case that some variable is typed in the source node $(W,B,\mathcal{X},\theta,\mu)$ , that is $\theta\neq\emptyset$ , then we add the following restrictions:

•

$\mathcal{X}^{\prime}\subseteq\mathcal{X}$ . (Thus, for $\theta\neq\emptyset$ the set of variables cannot increase.)

•

If $X\in\mathcal{X}^{\prime}$ , then $\theta(X)=\theta^{\prime}(X)$ . In particular, $\theta(X)$ is defined if and only if $\theta^{\prime}(X)$ is defined.

•

If $\theta(X)$ is not defined, then $\tau(X)=X$ .

•

If $\theta(X)$ is defined, then $\tau(X)\in\theta(X)^{*}\cup\theta(X)^{*}\,X\,\theta(X)^{*}$ .

We say that a substitution transition is special if $B=B^{\prime}$ . This implies that the label $h$ is is the identity on $M(B,\theta,\mu)$ ; and therefore the label will be $h=\varepsilon=\mathrm{id}_{C^{*}}$ . Later it would be enough to only consider special substitution transitions. However this would not simplify the following proof.

Lemma 7.2.

Let $E=(W,B,\mathcal{X},\theta,\mu)\overset{h}{\longrightarrow}(\tau(W),B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})=E^{\prime}$ be a substitution transition. If $\sigma^{\prime}$ solves $E^{\prime}$ then $\sigma=h\sigma^{\prime}\tau$ solves $E$ . In particular, if $\alpha\colon M(B,\theta,\mu)\to M(A,\emptyset,\mu_{0})$ is an $A$ -morphism, then $(\alpha,\sigma)$ and $(\alpha h,\sigma^{\prime})$ are entire solutions with $\alpha\sigma(W)=\alpha h\sigma^{\prime}(W^{\prime})$ .

Proof.

Recall by Def. 5.5 to prove $\sigma=h\sigma^{\prime}\tau$ solves $E$ we must show two things: $\sigma(W)=\sigma(\overline{W})$ and whenever $X$ is typed and $p=\theta(X)$ we must have $\sigma(X)\in p^{*}$ .

We begin by checking that $\theta\neq\emptyset$ implies $h\sigma^{\prime}\tau(X)\in\theta(X)^{*}$ for all typed variables. Consider a typed variable $X\in\mathcal{X}$ . The first case is: $\tau(X)\in\theta(X)^{*}X\theta(X)^{*}$ . Hence $X\in\mathcal{X}^{\prime}$ . By definition, $\theta^{\prime}(X)$ is defined and $\theta(X)=\theta^{\prime}(X)=p\in B^{*}$ . Then $\tau(X)\in p^{*}Xp^{*}$ , too. Hence, $\sigma^{\prime}\tau(X)\in p^{*}$ because every solution $\sigma^{\prime}$ has to satisfy $\sigma^{\prime}(X)\in p^{*}$ . Since $p\in B^{*}$ and $h$ is a $B$ -morphism we have $h(p)=p$ . Therefore $h\sigma^{\prime}\tau(X)\in\theta(X)^{*}$ in the first case. The second case is $\tau(X)\in p^{*}$ where $p=\theta(X)\in B^{*}$ . Again, we can conclude $h\sigma^{\prime}\tau(X)\in p^{*}$ . Thus, in both cases: whenever $X\in\mathcal{X}$ is typed, then $h\sigma^{\prime}\tau(X)\in\theta(X)^{*}$ .

Since $h$ , $\sigma^{\prime}$ , and $\tau$ are $B$ -morphisms, so is their composition $h\sigma^{\prime}\tau$ . Since $\sigma^{\prime}$ is a solution of $W^{\prime}=\tau(W)$ , we have $\sigma^{\prime}(W^{\prime})=\sigma^{\prime}(\overline{W^{\prime}})$ . Hence, $\sigma^{\prime}\tau(W)=\sigma^{\prime}\tau(\overline{W})$ since $\overline{\tau(W)}=\tau(\overline{W})$ because $\tau$ respects the involution. It follows that $h\sigma^{\prime}\tau(W)=h\sigma^{\prime}\tau(\overline{W})$ , so $\sigma(W)=\sigma(\overline{W})$ . Thus, $h\sigma^{\prime}\tau$ is a solution at ${E}$ . As a consequence, $(\alpha,h\sigma^{\prime}\tau)$ and $(\alpha h,\sigma^{\prime})$ are both entire solutions because $h$ is a $B$ -morphism and $A\subseteq B$ .

$M(B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})$$M(B,\mathcal{X},\theta,\mu)$$M(B^{\prime},\theta^{\prime},\mu^{\prime})$$M(B,\theta,\mu)$$\tau$$\sigma^{\prime}$$\sigma$$h$

∎

7.2.2. Compression transitions

Compressions are defined only if $\mathcal{X}=\mathcal{X}^{\prime}$ . They leave the variables invariant, but we encounter both situations $B\subseteq B^{\prime}$ or $B^{\prime}\subseteq B$ . However, in case that $\theta\neq\emptyset$ the situation is more subtle than for substitutions, and we need again technical restrictions in order to guarantee soundness.

Definition 7.3.

A compression transition

[TABLE]

is defined in $\mathcal{T}$ if $h\colon M(B^{\prime},\mathcal{X},\theta^{\prime},\mu^{\prime})\to M(B,\mathcal{X},\theta,\mu)$ is an $(A\cup\mathcal{X})$ -morphism such that the following conditions hold.

•

We have $W=h(W^{\prime})$

•

$h(b^{\prime})$ * can be written as a word in $B^{*}$ for every $b^{\prime}\in B^{\prime}$ and $\left|\mathinner{h(c)}\right|\geqslant 1$ for all $c\in B^{\prime}$ unless ${E}^{\prime}=(W^{\prime},B^{\prime},\mathcal{X},\theta^{\prime},\mu^{\prime})$ is a final state.*

•

$h$ * is specified by a mapping $h\colon\Delta^{\prime}\to B^{*}$ with $\Delta^{\prime}\subseteq B^{\prime}$ such that*

[TABLE]

•

A variable $X$ is typed using $\theta^{\prime}$ if and only if it is typed using $\theta$ .

•

There is some $e\geqslant 1$ such that for all typed variables we have

[TABLE]

Note that for a given $(A\cup\mathcal{X})$ -morphism $h\colon M(B^{\prime},\mathcal{X},\theta^{\prime},\mu^{\prime})\to M(B,\mathcal{X},\theta,\mu)$ the conditions to be a compression transition are effective.

Lemma 7.4.

Let ${{E}}=(h(W^{\prime}),B,\mathcal{X},\theta,\mu)\overset{h}{\longrightarrow}(W^{\prime},B^{\prime},\mathcal{X},\theta^{\prime},\mu^{\prime})={{E}}^{\prime}$ be a compression and $\sigma^{\prime}$ be a solution at ${E}^{\prime}$ . Then there exists a $B$ -morphism

[TABLE]

such that $h\sigma^{\prime}(X)=\sigma(X)$ for all $X\in\mathcal{X}$ . The $B$ -morphism $\sigma$ satisfies the following conditions. If $\alpha\colon M(B,\theta,\mu)\to M(A,\emptyset,\mu_{0})$ is an $A$ -morphism, then first, $(\alpha,\sigma)$ is an entire solution at ${E}$ and second, $(\alpha h,\sigma^{\prime})$ is an entire solution at ${E}^{\prime}$ . Moreover, $\alpha\sigma(W)=\alpha h\sigma^{\prime}(W^{\prime}).$

Proof.

Define $\sigma(X)=h\sigma^{\prime}(X)$ for all variables and $\sigma(b)=b$ for all $b\in B$ . This defines a $B$ -morphism $\sigma\colon M(B,\mathcal{X},\emptyset,\mu)\to M(B,\theta,\mu)$ since $M(B,\mathcal{X},\emptyset,\mu)$ is a free monoid. (There is no type yet on the left.) Let us show first that $(x,y)\in\theta$ implies $\sigma(x)=\sigma(y)$ in the monoid $M(B,\mathcal{X},\theta,\mu)$ . That is, $\sigma$ induces a $B$ -morphism (which we also denote by $\sigma$ ) $\sigma\colon M(B,\mathcal{X},\theta,\mu)\to M(B,\theta,\mu)$ .

For $(x,y)\in\theta$ with $x,y\in B^{*}$ the assertion $\sigma(x)=\sigma(y)$ is trivial because $\sigma$ leaves $B^{*}$ invariant. Thus, it is enough to consider a defining relation of the form $(Xp,\,pX)\in\theta$ where $p=\theta(X)\in B^{*}$ . Because of Def. 7.3 we know that $X$ is typed on the right hand side $M(B^{\prime},\mathcal{X},\theta^{\prime},\mu^{\prime})$ , too. Let $q=\theta^{\prime}(X)\in B^{\prime*}$ . Thus, $h(q)=p^{e}$ for some $e\geqslant 1$ according to the last condition in Def. 7.3. Since $\sigma^{\prime}(X)=q^{\ell}$ for some $\ell\geqslant 0$ , we conclude $h\sigma^{\prime}(X)=p^{e\ell}$ . Hence, whenever $(Xq,\,qX)\in\theta^{\prime}$ , then $h\sigma^{\prime}(Xp)=p^{1+e\ell}=h\sigma^{\prime}(pX)$ in $M(B,\theta,\mu)$ since $(Xp,\,pX)\in\theta$ .

So far we have shown that $\sigma$ is a well-defined morphism such that $\sigma(X)=h\sigma^{\prime}(X)$ . This implies $\sigma(h(X))=h\sigma^{\prime}(X)$ for all variables. For a constant $b\in B$ we have $\sigma(h(b))=h(b)=h(\sigma^{\prime}(b)).$ Hence $\sigma h=h\sigma^{\prime}$ and this means that the diagram in Fig. 4 commutes.

The morphism $h\colon M(B^{\prime},\theta^{\prime},\mu^{\prime})\to M(B,\theta,\mu)$ in Fig. 4 denotes the restriction of the morphism $h\colon M(B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})\to M(B,\mathcal{X},\theta,\mu)$ , too. Let $W=h(W^{\prime})$ and hence, $\overline{W}=h(\overline{W^{\prime}})$ . In order to see that $(\alpha,\sigma)$ is an entire solution at ${E}$ we use $\sigma h=h\sigma^{\prime}$ and we content ourselves to consider the following line of equations:

[TABLE]

In particular, $\alpha\sigma(W)=\alpha\sigma h(W^{\prime})=\alpha h\sigma^{\prime}(W^{\prime})$ . It is also clear that $(\alpha h,\sigma^{\prime})$ is an entire solution at ${E}^{\prime}$ since $h$ leaves $A$ invariant. ∎

Proposition 7.5.

Let $E_{0}\overset{h_{1}}{\longrightarrow}\cdots\overset{h_{t}}{\longrightarrow}E_{t}$ be a path in $\mathcal{T}$ of length $t$ , where $E_{0}=(W_{\mathrm{init}},A,\mathcal{X},\emptyset,\mu_{0})$ is an initial and $E_{t}=(W,B,\emptyset,\emptyset,\mu)$ is a final state. Then $E_{0}$ has an entire solution $(\mathrm{id}_{A^{*}},\sigma)$ with $\sigma(W_{\mathrm{init}})=h_{1}\cdots h_{t}(W)$ . In particular, for $X\in\mathcal{X}$ we have $\sigma(X)=h_{1}\cdots h_{t}(d_{X})$ ; and $\mathcal{T}$ is sound in the sense of (15).

Proof.

Since $E_{t}$ is final, it has a unique solution $\sigma_{t}=\mathrm{id}_{B^{*}}$ . By the lemmas above, we obtain a solution $\sigma$ at $E_{0}$ such that $\mathrm{id}_{A^{*}}\sigma(W_{\mathrm{init}})=\mathrm{id}_{A^{*}}h_{1}\cdots h_{t}\mathrm{id}_{B^{*}}(W)$ . Hence, $(\mathrm{id}_{A^{*}},\sigma)$ is an entire solution as desired. ∎

8. The intermediate automaton $\mathcal{F}$

Prop. 7.5 states that the large automaton $\mathcal{T}$ is sound. This property cannot be destroyed by removing states or transitions. That is, every subautomaton of $\mathcal{T}$ is sound, too. We define a subautomaton $\mathcal{F}$ of $\mathcal{T}$ as follows. All extended equations $E$ are states of $\mathcal{F}$ , so the state set is the same infinite set as for $\mathcal{T}$ . However, for transitions we are more restrictive. To define transitions, let us first define a weight for equations and states. The definition is tailored that all compression transitions and certain substitution transitions reduce the weight of the state.

Definition 8.1.

Let $E=(W,B,\mathcal{X},\theta,\mu)$ be an extended equation where (as usual) $W\in(A\cup\mathcal{Y})^{*}$ is represented as a word. The weight of the equation $\left\|\mathinner{W}\right\|$ is defined by

[TABLE]

The weight of the state $\left\|\mathinner{E}\right\|$ is a pair of natural numbers $\left\|\mathinner{E}\right\|=(\left\|\mathinner{W}\right\|,\left|\mathinner{B}\right|)$ .

For $\ell\in\mathbb{N}$ we order tuples in $\mathbb{N}^{\ell}$ lexicographically. For example $(0,42)<(1,0)$ , but $(1,0,42)>(0,10,100)$ ; and we use the fact that there are no infinite descending chains in $\mathbb{N}^{\ell}$ . Consider any transition $E\overset{h}{\longrightarrow}E^{\prime}$ in $\mathcal{T}$ . Then we always have $\left|\mathinner{E}\right|<\left|\mathinner{E^{\prime}}\right|$ unless the transition is a substitution transition where at least one variable that appears in $W$ pops out a constant.

Remark 8.2.

The definition of $\left\|\mathinner{W}\right\|$ is invariant under the word representation of $W$ . This follows because $\sum_{Y\in\mathcal{Y}}|x|_{Y}=\sum_{Y\in\mathcal{Y}}|y|_{Y}$ for all $(x,y)\in\theta$ . Second, the advantage to use the weight $\left\|\mathinner{W}\right\|$ in $\left\|\mathinner{E}\right\|$ (instead of using the more straightforward choice of $(\left|\mathinner{W}\right|,\left|\mathinner{B}\right|)$ for $\left\|\mathinner{E}\right\|$ ) is that following a substitution transition, which does nothing but replace variables $X$ by $\sigma(X)$ for $\left|\mathinner{\sigma(X)}\right|\leqslant{30}\delta$ , leads to a state of smaller weight.

A transition $(W,B,\mathcal{X},\theta,\mu)\overset{h}{\longrightarrow}(W^{\prime},B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})$ in $\mathcal{T}$ belongs to $\mathcal{F}$ if and only if the following properties are satisfied.

•

If $(W,B,\mathcal{X},\theta,\mu)\overset{h}{\longrightarrow}(W^{\prime},B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})$ is a substitution transition, then $W^{\prime}=\tau(W)$ , $B=B^{\prime}$ , and $h=\varepsilon$ .

•

If $E=(W,B,\mathcal{X},\theta,\mu)\overset{h}{\longrightarrow}(W^{\prime},B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})=E^{\prime}$ is a compression transition, then $W=h(W^{\prime})$ and $\left\|\mathinner{E^{\prime}}\right\|<\left\|\mathinner{E}\right\|$ .

The focus for the remaining part of the proof is on completeness. A subautomaton $\mathcal{A}$ of $\mathcal{T}$ is called complete if it holds:

[TABLE]

Since every subautomaton of $\mathcal{T}$ is also sound in the sense (15) we see that every complete subautomaton is sound and complete.

Proposition 8.3.

Let $\mathcal{A}$ be a trim, finite subautomaton of $\mathcal{F}$ . If $\mathcal{A}\neq\emptyset$ , then $\mathcal{S}$ has at least one solution. If $\mathcal{A}$ contains a directed cycle, then $\mathcal{S}$ has infinitely many solutions. Moreover, if $\mathcal{A}$ is complete, then the converse of both assertions is true.

Proof.

If $\mathcal{A}\neq\emptyset$ , then $\mathcal{S}$ has at least one solution by Prop. 7.5. Now assume that $\mathcal{A}$ contains a directed cycle. By hypothesis $\mathcal{A}$ is trim. Hence, there is an accepting path with a directed cycle and this cycle doesn’t involve any final state as final states are without outgoing arcs. Let $E_{s}\overset{h_{s}}{\longrightarrow}\cdots\overset{h_{t}}{\longrightarrow}E_{t}=E_{s}$ be this cycle. Without restriction we have $t>s$ and $\left\|\mathinner{E_{s}}\right\|=\left\|\mathinner{E_{s+1}}\right\|$ because $\mathbb{N}^{\ell}$ admits no infinite strictly descending chains. This means $E_{s}\overset{\varepsilon}{\longrightarrow}E_{s+1}$ must be a substitution transition which is defined by some $\tau$ with $\left|\mathinner{\tau(X)}\right|_{a}\geqslant 1$ for some $X$ where $X$ appears in the equation belonging to ${E}_{s}$ and $a$ is a constant. Hence, on some accepting path we can pop out an arbitrary number of letters of $X$ . Since on paths from an initial state to $E_{s}$ the labels are non-erasing endomorphisms, we see that we can make $\sigma(X)\in A^{*}$ at the initial state $E_{\mathrm{init}}$ larger and larger. Thus, there are infinitely many solutions. The converse, under the assumption that (17) holds, is trivial. ∎

9. Towards completeness

During the completeness proof we always work with a state $E=(W,B,\mathcal{X},\theta,\mu)$ and a given entire solution $(\alpha,\sigma)$ . Starting at a triple $({E},\alpha,\sigma)$ where ${E}$ is a standard state, we describe a deterministic process which yields a path $(h_{1},\ldots,h_{t})$ inside the (infinite) automaton $\mathcal{F}$ from ${E}$ to some final state $E_{t}=(W_{t},B_{t},\emptyset,\emptyset,\mu_{t})$ so that $\alpha\sigma(W)=h_{1}\cdots h_{t}(W_{t})$ . Thus, $\mathcal{F}$ is complete. The crucial property is that we are able to control the lengths of all intermediate equations $W_{i}$ for $1\leqslant s\leqslant t$ by

[TABLE]

We make sure that whenever we see an intermediate state $E_{s}$ where $\theta_{s}\neq\emptyset$ , then $\theta_{s}$ has a special structure. Moreover, when we follow a compression transition then we make sure the soundness condition holds for the corresponding label according Def. 7.3.

We then can deduce Thm. 4.3 because for defining the complete NFA $\mathcal{A}_{\mathcal{S}}$ mentioned in the theorem it is enough to consider the starting point

[TABLE]

Since $\left|\mathinner{W_{\mathrm{init}}}\right|\leqslant n$ we can ensure $\mathcal{A}_{\mathcal{S}}$ is finite by allowing extended equations in $\mathcal{A}_{\mathcal{S}}$ only if the corresponding equation satisfies a concrete length bound in $\mathcal{O}(\delta n)=\mathcal{O}(\left|\mathinner{H}\right|n^{2})$ . Moreover, we impose that $\mathcal{A}_{\mathcal{S}}$ is trim. We will come back later to these issues. For the moment we work in the infinite automaton $\mathcal{F}$ and there is no length bound for the equation $W$ .

9.1. Dummy variables denoting the empty word

In the following it is convenient to have the following notation at our disposal. We introduce purely formal symbols of the form $(f,D)$ where $f\in H$ and $D$ is called dummy variable, but the symbol $(f,D)$ is just another explicit notation for the empty word $1$ . The dummy variable $D$ is never listed in $\mathcal{Y}$ . Its only purpose is that we have a unified notation for local equations (and avoid case distinctions). Since $(f,D)=1$ every morphism maps $(f,D)$ to $1$ . The advantage is that with the help of a dummy variable, we may, whenever convenient, assume that every local equation has the form

[TABLE]

Here $X,Y,Z$ are (perhaps dummy) variables and $u,v,w$ are words over constants.

9.2. The weight of entire solution and the forward property

We need a termination condition for the following compression procedure. Therefore we define a weight $\left\|\mathinner{E,\alpha,\sigma}\right\|\in\mathbb{N}^{3}$ for the triple $(E,\alpha,\sigma)$ where $E=(W,B,\mathcal{X},\theta,\mu)$ is a state with an entire solution $(\alpha,\sigma)$ by

[TABLE]

At non-final states the weights $\left\|\mathinner{W}\right\|=\left|\mathinner{W}\right|+{30}\delta\sum_{Y\in\mathcal{Y}}|W|_{Y}$ and $\left\|\mathinner{E}\right\|=(\left\|\mathinner{E}\right\|,\left|\mathinner{B}\right|)$ were defined in (16). Thus, actually for all states, we can write $\left\|\mathinner{E,\alpha,\sigma}\right\|$ as a pair

[TABLE]

Being at $(E,\alpha,\sigma)$ we say that a transition ${E}\overset{h}{\longrightarrow}E^{\prime}=(W^{\prime},B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})$ satisfies the forward property if $E^{\prime}$ has an entire solution $(\alpha h,\sigma^{\prime})$ such that first $\left\|\mathinner{E^{\prime},\alpha h,\sigma^{\prime}}\right\|<\left\|\mathinner{E,\alpha,\sigma}\right\|$ and second, $\alpha h\sigma^{\prime}(W^{\prime})=\alpha\sigma(W).$

Following a transition ${E}\overset{h}{\longrightarrow}E^{\prime}$ which satisfies the forward property means that we switch from $(E,\alpha,\sigma)$ to $(E^{\prime},\alpha h,\sigma^{\prime})=(E^{\prime},\alpha^{\prime},\sigma^{\prime})$ . Typically after each such step we rename the tuple $(W^{\prime},B^{\prime},\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime},\alpha^{\prime},\sigma^{\prime})$ as $(W,B,\mathcal{X},\theta,\mu,\alpha,\sigma)$ . Since using a transition satisfying the forward property reduces the weight $\left\|\mathinner{E,\alpha,\sigma}\right\|$ , there are no infinite paths of transitions where all transitions on the path satisfy the forward property.

9.3. Meta rules

Let $E=(W,B,\mathcal{X},\theta,\mu)$ be a state with an entire solution $(\alpha,\sigma)$ . We apply the following meta rules whenever possible.

9.3.1. Remove variables with short solutions

If $\left|\mathinner{\sigma(Y)}\right|\leqslant{30}\delta$ for some variable $Y\in\mathcal{Y}$ such that $|W|_{Y}\geqslant 1$ , then follow a substitution transition ${E}\overset{\varepsilon}{\longrightarrow}{E}^{\prime}$ which is defined by a $B$ -morphism $\tau$ such that $\tau(Z)=\sigma(Z)$ if $\left|\mathinner{\sigma(Z)}\right|\leqslant{30}\delta$ and $\tau(Z)=Z$ otherwise. The state ${E}^{\prime}=(\tau(W),B,\mathcal{X}^{\prime},\theta^{\prime},\mu^{\prime})$ uses the same set of constants, but we have $\mathcal{X}^{\prime}\varsubsetneq\mathcal{X}$ . We also have $\theta^{\prime}\subseteq\theta$ and $\mu^{\prime}$ is the restriction of $\mu$ . (Thus, according to our convention we can also write ${E}^{\prime}=(\tau(W),B,\mathcal{X}^{\prime},\theta,\mu)$ .) Let $\sigma^{\prime}$ be the restriction of $\sigma$ , then $(\alpha,\sigma^{\prime})$ is an entire solution at ${E}^{\prime}$ and we have $\alpha\varepsilon\sigma(W)=\alpha\sigma^{\prime}\tau(W)$ . Moreover, $\left\|\mathinner{W^{\prime}}\right\|<\left\|\mathinner{W}\right\|$ . Hence, $\left\|\mathinner{{E}^{\prime}}\right\|<\left\|\mathinner{{E}}\right\|$ and the transition reduces the weight of the state.

As a consequence, whenever we are at a state $E=(W,B,\mathcal{X},\theta,\mu)$ with an entire solution $(\alpha,\sigma)$ , then we assume $\left|\mathinner{\sigma(Y)}\right|>{30}\delta$ for all $Y\in\mathcal{Y}$ where $|W|_{Y}\geqslant 1$ .

9.3.2. Remove useless constants

We say that a letter $a\in B\setminus A$ is useless (with respect to $\sigma$ ) if $|\sigma(W)|_{f\cdot a}=0$ for all $f\in H$ . Note that a letter $a\in A$ is never useless. If $B$ contains a useless letter $a$ , then define $B^{\prime}=B\setminus\left\{f\cdot a\mathrel{\left|\vphantom{f\cdot a}\vphantom{f\in H}\right.}f\in H\right\}$ . The inclusion of $B^{\prime}$ into $B$ defines canonical embeddings $M(B^{\prime},\mathcal{X},\theta,\mu)\to M(B,\mathcal{X},\theta,\mu)$ and $M(B^{\prime},\theta,\mu)\to M(B,\theta,\mu)$ such that $W\in M(B^{\prime},\mathcal{X},\theta,\mu)$ and $\sigma(W)\in M(B^{\prime},\theta,\mu)$ . The state ${E}^{\prime}=(W,B^{\prime},\mathcal{X},\theta,\mu)$ has an entire solution $(\alpha,\sigma^{\prime})$ where $\sigma^{\prime}$ is the restriction of $\sigma$ . Moreover, $\left\|\mathinner{{E}^{\prime}}\right\|<\left\|\mathinner{{E}}\right\|$ . Hence, we can follow the compression transition $E\overset{\varepsilon}{\longrightarrow}(W,B^{\prime},\mathcal{X},\theta,\mu)$ which satisfies the forward property.

As a consequence, whenever we are at a state $E=(W,B,\mathcal{X},\theta,\mu)$ with an entire solution $(\alpha,\sigma)$ , then we assume that $B$ doesn’t contain any useless letters.

Remark 9.1.

We may have that $B\setminus A$ contains letters that are not $H$ -visible, but a solution $\sigma$ uses them. Removing useless letters does not remove such letters.

9.3.3. Moving to a final state

Let $E=(W,B,\emptyset,\emptyset,\mu)$ be a standard state without any variables and with an entire solution $(\alpha,\sigma)$ . Then $\sigma=\mathrm{id}_{B}$ is the identity on $M(B,\emptyset,\emptyset,\mu)=M(B,\emptyset,\mu)$ and we have $W=\overline{W}$ . If $E$ is final, there is nothing to do. Hence, we assume that $E$ is not final. Since $W=\overline{W}$ , Def. 5.4 tells us

[TABLE]

Hence we can enlarge $B$ to a set $B^{\prime}$ which contains all distinguished $d_{i}$ for $1\leqslant i\leqslant k$ . (By our convention none of the $d_{i}$ belongs to $B$ because the state $E$ is not final.) We define a $B$ -morphism $h\colon M(B^{\prime},\emptyset,\mu)\to M(B,\emptyset,\mu)$ by letting $h(d_{i})=x_{i}$ . Moreover, we let

[TABLE]

We have no variables and $W^{\prime}=\overline{W^{\prime}}$ . Hence $E^{\prime}=(W^{\prime},B^{\prime},\emptyset,\emptyset,\mu^{\prime})$ is final. The entire solution at $E^{\prime}$ is $(\alpha h,\mathrm{id}_{B})$ and we have $\alpha h(W^{\prime})=\alpha(W)$ since none of the $d_{i}$ belong to $B$ . Since $\left\|\mathinner{E,\alpha,\mathrm{id}_{B}}\right\|>(0,0,0)=\left\|\mathinner{E^{\prime},\alpha h,\mathrm{id}_{B}}\right\|$ the compression transition $E\overset{h}{\longrightarrow}E^{\prime}$ satisfies the forward property. Hence, we are done.

As a consequence, whenever we are at a standard state $E=(W,B,\mathcal{X},\emptyset,\mu)$ with an entire solution $(\alpha,\sigma)$ , then we assume that $\mathcal{X}\neq\emptyset$ . Moreover according to the other meta rules we have $|\sigma(X)|\geqslant{30}\delta$ for all $X\in\mathcal{X}$ and every constant $b\in B$ is $H$ -visible in $W$ .

10. Compression round: the first phase

We perform the compression in rounds. Each round has two phases. The first phase is called $\delta$ -periodic compression, the second one is called pair compression. During $\delta$ -periodic compression we perform all meta rules whenever possible. Recall how meta rules decrease the weight $(\left\|\mathinner{W}\right\|,|B|)$ at states: removing a variable makes $\left\|\mathinner{W}\right\|$ smaller, removing useless letters doesn’t change $\left\|\mathinner{W}\right\|$ , but it makes $B$ smaller. Moving to a final state decreases the weight of the state down to $(0,0)$ (which was the exceptional weight at final states) . None of these rules increases the sum $\sum_{X\in\mathcal{X}}|\alpha\sigma(X)|$ . Therefore all meta rules satisfy the forward property according to Sect. 9.2.

10.1. A simple, but useful, estimation

During the rounds the length oscillates but it can be bounded by some function in $\mathcal{O}(\delta n)$ . In order to obtain such a bound we will later apply the following fact twice with different parameters.

Lemma 10.1.

Let $0\leqslant q<1$ and $c\geqslant 1$ for some real constants $q,c$ , and let $s\colon\mathbb{N}\to\mathbb{N}$ be a function with $s(0)\leqslant\frac{c}{1-q}\delta n$ and which satisfies a bound

[TABLE]

for all $t\in\mathbb{N}$ . Then $s(t)\leqslant\frac{c}{1-q}\cdot\delta n$ for all $t\in\mathbb{N}$ .

Proof.

The statement is true for $t=0$ . Assuming it is true for $t\geqslant 0$ then

[TABLE]

∎

In $\mathcal{O}$ -notation (19) reads as: if $s(0)\geqslant 0$ and $s(t+1)\in qs(t)+\mathcal{O}(\delta n)$ , then $s(t)\in\mathcal{O}(\delta n)$ for all $t$ .

10.2. Alphabet reduction at standard states

During our procedures we introduce more and more letters, so the set $B$ grows, and removing useless letters is not enough to keep the size of $B$ in $\mathcal{O}(\left|\mathinner{H}\right|\cdot\left|\mathinner{W}\right|)$ .

The following procedure which we call alphabet reduction is not a meta rule (which we may apply whenever possible). If we call the procedure we explicitly say so. When we call it we wish that $B\setminus A$ contains only $H$ -visible letters in $W$ .

We begin at a standard state $E=(W,B,\mathcal{X},\emptyset,\mu)$ with an entire solution $(\alpha,\sigma)$ where there is some letter $b\in B$ which is not $H$ -visible. Hence $|W|_{f\cdot b}=0$ for all $f\in H$ . Removing useless letters is a meta rule. Hence, we may assume without restriction that all letters are useful and therefore we may assume $|\sigma(X)|_{b}\geqslant 1$ for some variable. (That is, we are in the situation of Rem. 9.1.) Define

[TABLE]

Then we have $W\in M(B^{\prime},\mathcal{X},\emptyset,\mu)$ . The procedure will takes us (via a compression transition defined by the inclusion $B^{\prime}\subseteq B$ ) to the state $E^{\prime}=(W,B^{\prime},\mathcal{X},\emptyset,\mu)$ . Since $b\in B\setminus B^{\prime}$ we have $|B^{\prime}|<|B|$ and therefore $\left\|\mathinner{E^{\prime}}\right\|<\left\|\mathinner{E}\right\|$ , too.

It is here where the notion of entire solution becomes important. We have $\alpha\colon M(B,\emptyset,\mu)\to M(A,\emptyset,\mu_{0})$ , so we can define a $B^{\prime}$ -morphism $\beta\colon M(B,\emptyset,\mu)\to M(A,\emptyset,\mu_{0})$ by $\beta(b)=\alpha(b)$ for $b\in B\setminus B^{\prime}$ . Since $M(B,\emptyset,\mu)=B^{*}$ is a free monoid, we don’t have to worry to check defining relations. Moreover, $\sigma^{\prime}=\beta\sigma$ is solution at $E^{\prime}=(W,B^{\prime},\mathcal{X},\emptyset,\mu^{\prime})$ . Thus, we can switch from $(E,\alpha,\sigma)$ to $(E,\alpha,\sigma^{\prime})=(E,\alpha,\beta\sigma)$ via the compression transition $(W,B,\mathcal{X},\emptyset,\mu)\overset{\varepsilon}{\longrightarrow}(W,B^{\prime},\mathcal{X},\emptyset,\mu)$ . Since $\alpha$ is an $A$ -morphism we obtain $\alpha=\alpha\beta$ . Hence, $\alpha\sigma(W)=\alpha\beta\sigma(W)=\alpha\varepsilon\sigma^{\prime}(W)$ as desired.

As a consequence, whenever we perform an alphabet reduction, then we arrive at a standard state $E=(W,B,\mathcal{X},\emptyset,\mu)$ with an entire solution $(\alpha,\sigma)$ such that every letter in $B\setminus A$ is $H$ -visible in $W$ . This means that after alphabet reduction the size of $B$ is at most $\left|\mathinner{H}\right|\cdot(\left|\mathinner{A}\right|+\left|\mathinner{W}\right|)$ .

10.3. Mapping the positions from $\sigma(W)$ to $W$

Let $E=(W,B,\mathcal{X},\emptyset,\mu)$ be a state with an empty type $\theta$ and let $\sigma\colon M(B,\mathcal{X},\emptyset,\mu)\to M(B,\emptyset,\mu)$ be any $B$ -morphism. Recall that $\{\mathinner{1,\ldots,m}\}$ (resp. $\{\mathinner{1,\ldots,\ell}\}$ ) denotes the set of positions of $\sigma(W)$ (resp. $W$ ). Then $\sigma$ induces a mapping $\pi_{\sigma}$ from $\{\mathinner{1,\ldots,m}\}$ to $\left\{\mathinner{1,\ldots,\ell}\right\}$ as follows. We define $\pi_{\sigma}$ from left-to-right. We let $\pi_{\sigma}(1)=1$ . The first position in $\sigma(W)$ is labeled with $\#$ and so is the first position in $W$ . No other position than $1$ is mapped to $1$ . We shall keep the invariant that $\sigma(W[1,m^{\prime}])=W[1,\ell^{\prime}]$ if $m^{\prime}$ is the largest position which is mapped to $\ell^{\prime}$ . In particular, we have $\sigma(W[m^{\prime}+1,m])=W[\ell^{\prime}+1,\ell]$ .

Now assume $\pi_{\sigma}(i)$ is already defined for all $1\leqslant i\leqslant m^{\prime}$ and $m^{\prime}\leqslant m$ . If $m^{\prime}=m$ we are done. Otherwise we have $m^{\prime}<m$ and we consider $\pi_{\sigma}(m^{\prime})=\ell^{\prime}$ . By the invariant we know $\ell^{\prime}<\ell$ . We look at the label of the position $\ell^{\prime}+1$ . It is labeled by a letter in $W$ and there are two cases. In the first case the label is a constant $b\in B$ . In this case we let $\pi_{\sigma}(m^{\prime}+1)=\ell^{\prime}+1$ . In the second case the label is of the form $Y$ with $Y\in\mathcal{Y}$ . In that case we map all positions in the interval $[m^{\prime}+1,m^{\prime}+|\sigma(Y)|\,]$ to the single position $\ell^{\prime}+1$ .

Note that $\pi_{\sigma}\colon\{\mathinner{1,\ldots,m}\}\to\{\mathinner{1,\ldots,\ell}\}$ enjoys the following properties. If $\ell$ is a position of $W$ which is labeled by a constant $b\in B$ , then ${\pi_{\sigma}}^{-1}(\ell)$ is a single position in $\sigma(W)$ which is labeled by $b$ , too. If $\ell$ is a position of $W$ which is labeled by a variable $Y\in\mathcal{Y}$ , then ${\pi_{\sigma}}^{-1}(\ell)$ is a interval of length $|\sigma(Y)|$ in $\sigma(W)$ . The label of that interval is just $\sigma(Y)$ .

Definition 10.2.

We say that a position $m^{\prime}$ of $\sigma(W)$ is visible (in $W$ ) if $\pi_{\sigma}(m^{\prime})$ is a constant. Otherwise it is called invisible. An interval $[i,j]$ of positions of $\sigma(W)$ is visible (in $W$ ) (resp. invisible) if all positions in that interval are visible (resp. invisible) positions. If $[i,j]$ contains an invisible position, but $\left|\mathinner{\pi_{\sigma}[i,j]}\right|\geqslant 2$ , then we say that the interval $[i,j]$ is crossing.

10.4. The start of a compression round

Each compression round starts at a standard state $E_{r}=(W_{r},B_{r},\mathcal{X}_{r},\emptyset,\mu_{r})$ with an entire solution $(\alpha_{r},\sigma_{r})$ . We may assume that no meta rule is applicable. The very first step is now an alphabet reduction. For simplicity, we denote the state again by $E_{r}=(W_{r},B_{r},\mathcal{X}_{r},\emptyset,\mu_{r})$ and we have $|B_{r}|\leqslant|H|\cdot|W_{r}|$ .

10.5. $\delta$ -periodic compression

For convenience we rename the tuple

[TABLE]

At this point we know that no meta rule applies to $E$ and that $\left|\mathinner{B}\right|\leqslant\left|\mathinner{H}\right|\cdot\left|\mathinner{W}\right|$ .

Let us consider all very long maximal $\delta$ -periodic factors $w$ of $\sigma(W)$ which have a maximal occurrence with at least one visible position. (By maximal occurrence we mean that $w$ is not a factor of a longer $\delta$ -periodic word at that occurrence.) We assume that at least one such occurrence exists, otherwise we skip the main body of the $\delta$ -periodic compression and proceed directly to the end: Sect. 10.6.

We write $w=up^{e}rv$ with $\left|\mathinner{u}\right|=\left|\mathinner{v}\right|=3\delta$ , $p$ is primitive of length at most $\delta$ and $r$ is a nonempty prefix of $p$ . (Recall very long means $|w|\geqslant 10\delta$ so $|p^{e}r|\geqslant 4\delta$ .) By Lem. 6.4, we can encode the factor $up^{e}rv$ uniquely by writing the triple $(p,r,e)$ . Let us call $u$ and $v$ the borders of the very long maximal $\delta$ -periodic factor $up^{e}rv$ . Consider different occurrences $up^{e}rv$ and $u^{\prime}p^{\prime e^{\prime}}r^{\prime}v^{\prime}$ of very long maximal $\delta$ -periodic factors in $\sigma(W)$ . If the occurrences overlap, then this overlap takes place in the borders only, because otherwise the occurrence of the factor was not maximal.

It follows that the number of occurrences very long maximal $\delta$ -periodic factors with at least one visible position is less than $\left|\mathinner{W}\right|$ . Thus we find some minimal index set $\Lambda$ of size $\left|\mathinner{\Lambda}\right|<\left|\mathinner{W}\right|$ such that

[TABLE]

is exactly the set of very long maximal $\delta$ -periodic factors of $\sigma(W)$ which have a maximal occurrence with at least one visible position.

The idea is that at the end we arrive at a state with a solution where all these occurrences are replaced by $u[r,s,\lambda]v$ where $[r,s,\lambda]$ is the notation for a fresh letter such that $p=rs$ , $r\neq 1$ , and $\lambda\in\Lambda$ . We also color certain positions in $W$ and $\sigma(W)$ . At the end of the process a position will be green if and only if it is labeled by some new letter $[r,s,\lambda]$ .

Note that $\lambda$ is just a formal symbol: we need at most $\mathcal{O}(|W|)$ bits to encode it. We also define a set of primitive words

[TABLE]

We have

[TABLE]

Next we consider fresh variables which are denoted as $[X,f(sr)]$ where $X\in\mathcal{X}\subseteq\mathcal{X}$ , $f\in H$ and for certain $p_{\lambda}\in P_{\Lambda}$ and then for all $rs=p_{\lambda}$ . These new variables will later be typed. We define the action of $H$ by $g\cdot[X,p]=[X,g(p)]$ and the involution by $\overline{[X,p]}=[\overline{X},\overline{p}]$ . The idea is $\sigma([X,p])\in p^{*}$ ; and thus, $\sigma(\overline{[X,p]})\in{\overline{p}}^{*}$ and $\sigma((f,[X,p]))\in f(p)^{*}$ . Note that $(f,[X,p])=(g,[X,p])$ if and only if ${g}^{-1}f(p)=p$ and hence ${g}^{-1}f\in H_{p}$ is in the known stabilizer of $p$ .

The following routine introduces these new variables using substitution transitions. Recall that defining $\tau(X)=w$ substitutes $(f,X)$ by $f(w)$ and simultaneously $(f,\overline{X})$ by $f(\overline{w})=\overline{f(w)}$ for all $f\in H$ .

begin procedure (insert new variables)

Initialize a set of fresh variables by $\mathcal{X}_{\mathrm{new}}=\emptyset$ and put $E=(W,B,\mathcal{X}\cup\mathcal{X}_{\mathrm{new}},\emptyset,\mu)$ .

forall $X\in\mathcal{X}$ do

(Note this means we do the process once for $X$ and once for $\overline{X}$ .)

(1)

Apply all meta rules whenever possible; in particular, $\left|\mathinner{\sigma(Y)}\right|\geqslant{30}\delta$ for all variables. 2. (2)

Let $q^{d}q^{\prime}$ be the longest suffix of $\sigma(X)$ such that $q$ is primitive, $\left|\mathinner{q}\right|\leqslant\delta$ , and $q^{\prime}$ is a prefix of $q$ . If $\left|\mathinner{q^{d}q^{\prime}}\right|\leqslant 3\delta$ , then do nothing. 3. (3)

If $\left|\mathinner{q^{d}q^{\prime}}\right|>3\delta$ , then define words $p$ , $p^{\prime}$ , and $e\geqslant 0$ by $q^{d}q^{\prime}=up^{e}p^{\prime}$ with $\left|\mathinner{u}\right|=3\delta$ , $\left|\mathinner{p}\right|=\left|\mathinner{q}\right|$ , and $1\neq p^{\prime}\leqslant p$ . (Note that $p$ is primitive: we have $p=q_{2}q_{1}$ for some factorization $q=q_{1}q_{2}$ .) We enlarge $\mathcal{X}_{\mathrm{new}}$ by a fresh variables $[X,sr]$ for all factorizations $p=rs$ . Moreover, if we enlarge $\mathcal{X}_{\mathrm{new}}$ by some $[X,p]$ , then we also include $[X,f(p)]$ and $[X,f(\overline{p})]$ for all $f\in H$ .

We can write $\sigma(X)=xup^{e}p^{\prime}$ with $\left|\mathinner{xu}\right|\geqslant 3\delta$ . Follow a substitution transition $E\overset{\varepsilon}{\longrightarrow}E^{\prime}=(\tau(W),B,\mathcal{X}\cup\mathcal{X}_{\mathrm{new}},\emptyset,\mu^{\prime})$ which is defined by $\tau(X)=X[X,p]p^{\prime}$ and define an entire solution at $E^{\prime}$ by $(\alpha,\sigma^{\prime})$ where $\sigma^{\prime}(X)=xu$ and $\sigma^{\prime}[X,p]=p^{e}$ . The transition satisfies the forward property. (Due to the meta rules it can happen that $\mathcal{X}$ becomes smaller and/or that $\mathcal{X}_{\mathrm{new}}$ is not enlarged at all.) 4. (4)

Rename $E^{\prime},\tau(W),\mu^{\prime},\sigma^{\prime}$ as $E,W,\mu,\sigma$ .

**endforall

endprocedure**

The number of new variables $[X,p]\in\mathcal{X}_{\mathrm{new}}$ is bounded by $\mathcal{O}(\left|\mathinner{H}\right|\delta{n})$ because $\left|\mathinner{\mathcal{X}}\right|\leqslant n$ . The factor $\delta$ comes in because we consider all cyclic permutations $sr$ of $p=rs$ . The factor $\left|\mathinner{H}\right|$ comes in because we close under $H$ -action. On the other hand we don’t need to list $[X,f(p)]$ in $\mathcal{X}_{\mathrm{new}}$ for $1\neq f\in H$ if $[X,p]$ is already listed. Thus, a list of $\mathcal{O}(\delta{n})$ new variables suffices to specify the full set $\mathcal{X}_{\mathrm{new}}$ (which is closed under the action of $H$ and involution. Note that $\overline{[X,p]}=[\overline{X},\overline{p}]\neq[X,p]$ . So we keep the invariant that variables are not self-involuting.

From now on until the end of $\delta$ -periodic compression we only remove variables. So, if $\mathcal{Y}^{\prime}=H\cdot\mathcal{X}^{\prime}$ is the full set of variables we meet during the whole procedure, then we have:

[TABLE]

Before the procedure we had $\mathcal{Y}=H\times\mathcal{X}$ and $\mathcal{X}\subseteq\mathcal{X}$ , and $\sum_{Y\in\mathcal{Y}}|W|_{Y}\leqslant n$ as required for a standard state. The corresponding set after the procedure is $\mathcal{Y}_{\text{new}}=\mathcal{X}_{\mathrm{new}}\cup H\times\mathcal{X}$ . We have to check that $\sum_{Y\in\mathcal{Y}_{\text{new}}}|W|_{Y}\leqslant 3n$ (because otherwise it is not an extended equation as per Def. 5.4). This bound is immediate.

In the case $\mathcal{X}_{\mathrm{new}}=\emptyset$ we are still at a standard state and $\sum_{Y\in\mathcal{Y}_{\text{new}}}|W|_{Y}\leqslant n.$ For $\mathcal{X}_{\mathrm{new}}\neq\emptyset$ we are not at a standard state because $\mathcal{Y}_{\text{new}}$ is not contained in $H\times\mathcal{X}$ .

Since for $(W,B,\mathcal{X}\cup\mathcal{X}_{\mathrm{new}},\emptyset,\mu)$ the type $\theta$ is empty, we map the positions of $\sigma(W)$ to the positions $W$ as explained above in Sect. 10.3. Consider any occurrence of a very long $\delta$ -periodic factor $w=up^{e}p^{\prime}v$ in $\sigma(W)$ which is maximal and where at least one position is visible and where $|u|=|v|=3\delta$ . Consider the occurrence of all very long maximal $\delta$ -periodic factors $w^{\prime}$ where $w^{\prime}\in\left\{g\cdot w,g\cdot\overline{w}\mathrel{\left|\vphantom{g\cdot w,g\cdot\overline{w}}\vphantom{g\in H}\right.}g\in H\right\}$ . Each factor $w^{\prime}$ can be written as $w^{\prime}=u^{\prime}w^{\prime\prime}v^{\prime}$ where $|u^{\prime}|=|v^{\prime}|=3\delta$ .

For all maximal occurrences of these factors $w^{\prime}$ let us color the inner positions belonging to $w^{\prime\prime}$ green. Then only green positions are mapped to a variable $[X,q]\in\mathcal{X}_{\mathrm{new}}$ . It is also clear that we we can write $q=sr$ for some factorization $p=rs$ . Let us transport the green color to the corresponding positions in $W$ . Then for all positions in $W$ which are labeled by a variable it holds that the position is green if and only if it is new variable. Note that green positions in $\sigma(W)$ are separated by words of length at least $3\delta$ .

In the next procedure we will introduce a type $\theta$ which consists of defining relations of the form $[X,aq]a=a[X,qa]$ , but it will be enough to apply such a rule where both positions in $W$ are green. Hence, the color of the positions will not be altered under this restriction. In order to define $\theta$ we use $\mathcal{X}_{\mathrm{new}}\neq\emptyset$ , otherwise we skip the next procedure.

begin procedure (introduce a type $\theta$ )

(1)

Define the type $\theta$ by

[TABLE]

Note that $[X,p]p=p[X,p]$ is actually a consequence of the other relations in $\theta$ . We include it order to satisfy the definition of type in Sect. 3.3, that if a variable (in this case $[X,p]$ ) appears in a type then there is a unique primitive word $p$ with which it commutes. 2. (2)

Choose any $[X,p]\in\mathcal{X}_{\mathrm{new}}$ and write $\sigma([X,p])=p^{e}$ . (Note that we have $e\geqslant 10$ in this case, since a meta rule would remove the variable if it has a solution shorter than $30\delta$ ). Define a morphism $\tau\colon M(B,\mathcal{X}\cup\mathcal{X}_{\mathrm{new}},\emptyset,\mu)\to M(B,\mathcal{X}\cup\mathcal{X}_{\mathrm{new}},\theta,\mu)$ by $\tau[X,p]=[X,p]p^{5}$ . The morphism is well-defined since $[X,q]q=q[X,q]$ in $M(B,\mathcal{X}\cup\mathcal{X}_{\mathrm{new}},\theta,\mu)$ for all $[X,q]\in\mathcal{X}_{\mathrm{new}}$ . 3. (3)

Follow the corresponding substitution transition

[TABLE]

The transition satisfies the forward property with the entire solution $(\alpha,\sigma^{\prime})$ where $\sigma^{\prime}[X,p]=p^{e-5}$ . Apply all meta rules. (After that we may have $\theta=\emptyset$ again.) 4. (4)

Rename $(\tau(W),\mu^{\prime},\sigma^{\prime})$ as $(W,\mu,\sigma)$ .

endprocedure

Using the relations from $\theta$ we can move the $\mathcal{X}_{\mathrm{new}}$ variables around over green positions. Thus, we can choose a word representation for $W\in M(B,\mathcal{X}\cup\mathcal{X}_{\mathrm{new}},\theta,\mu)$ as $W\in(B\cup\mathcal{Y}\cup\mathcal{X}_{\mathrm{new}})^{*}$ such that every maximal green interval $[i,j]$ of positions in $W$ is labeled by a word of the form

[TABLE]

In the following we simply say that $W[i,j]$ is a maximal green factor when we actually refer to the label $W[i,j]$ of a maximal green interval $[i,j]$ of positions. We can choose $0\leqslant d\leqslant 2$ because in a standard state all local equations are triangular, but this is not essential. Without restriction we have $r\neq 1$ and that $(rs)\cdots(rs)(rsr)=(rs)^{e}r$ satisfies $e\geqslant 1$ . This is clear if $d=0$ . For $d\geqslant 1$ it is enough to substitute $[X_{1},rs]$ by $[X_{1},rs]rs$ . For each $1\leqslant i\leqslant d$ there is $e_{i}\in\mathbb{N}$ such that $\sigma[X_{i},rs]=(rs)^{e_{i}}$ and therefore:

[TABLE]

At this point we will change $\theta$ to $\theta^{\prime}$ (defined below) since the defining relations $[X,aq]a=a[X,qa]$ with $a\in B$ will be of no use anymore. The idea is to replace the factors $rs$ , $sr$ , and $rsr$ by fresh letters denoted by $[rs]$ , $[sr]$ , and $[r,s,\lambda]$ . The $\lambda$ is used to encode the sum $e+e_{1}+\cdots+e_{d}$ . We will make the assumption $r\neq 1$ , then $[rs]$ , $[sr]$ , and $[r,s,\lambda]$ are three different letters for $s\neq 1$ , and there is no letter $[1,p_{\lambda},\lambda]$ , only $[p_{\lambda},1,\lambda]$ . For the maximal green factor $W[i,j]$ we intend to define a word $W^{\prime}[i,j]$ and a type $\theta^{\prime}$ such that

[TABLE]

More precisely, for each $u_{\lambda}p_{\lambda}^{e_{\lambda}}\,r_{\lambda}v_{\lambda}\in F_{\lambda}$ we associate a new letter $[r_{\lambda},s_{\lambda},\lambda]$ with $\mu^{\prime}([r_{\lambda},s_{\lambda},\lambda])=\mu(r_{\lambda},s_{\lambda}r_{\lambda})$ , and $[q]$ for every typed variable $[X,q]$ with $\mu^{\prime}[q]=\mu(q)$ . Recall our notation that $u_{\lambda}p_{\lambda}^{e_{\lambda}}\,r_{\lambda}v_{\lambda}$ is a very long $\delta$ -periodic word, $|u_{\lambda}|=|v_{\lambda}|=3\delta$ , $p$ is primitive, and $r\neq 1$ . It is important that $[r,s,\lambda]$ is visible, whenever at least one the green positions is visible. This why in the different word representations (26)–(28) for the same $W^{\prime}[i,j]$ the $[r,s,\lambda]$ always sits between the variables.

By introducing (if necessary) more fresh letters we close the set of fresh letters under involution and $H$ -action. We let:

[TABLE]

The set of these new letters is denoted by $B_{\mathrm{new}}$ . The number of new letters can be bounded by:

[TABLE]

Let $B^{\prime}=B\cup B_{\mathrm{new}}$ . Next, we define the new type $\theta^{\prime}$ . For each typed variable $[X,q]$ there is exactly one commutation rule: $[X,q]q=q[X,q]$ . The other defining relations say that $[r_{\lambda}s_{\lambda}]$ and $[s_{\lambda}r_{\lambda}]$ are “conjugate” due to the letter $[r_{\lambda},s_{\lambda},\lambda]$ . Making this formal we specify $\theta^{\prime}$ by:

[TABLE]

Note that the defining relations in $\theta^{\prime}$ are designed that (26)–(28) hold.

We can now define the rest of the $\delta$ -periodic compression procedure. It is the analogue to Jeż’s “block compression” as described in [11]. During the process sets of positions for $W$ and $\sigma(W)$ change, but our process makes clear that we can always transport the green color: no change involves an interval which has both colored and uncolored positions. We perform the following steps.

begin procedure (remove very long $\delta$ -periodic factors with a visible position)

(1)

Define the element $W^{\prime}\in M(B^{\prime},\mathcal{X}\cup\mathcal{X}_{\mathrm{new}},\theta^{\prime},\mu^{\prime})$ by replacing maximal green factors in $W$ just we as have done for $W[i,j]$ in (21) in order to produce $W^{\prime}[i,j]$ in (26). Doing this everywhere defines $W^{\prime}$ in a word representation as $W^{\prime}\in(B^{\prime}\cup\mathcal{Y}\cup\mathcal{X}_{\mathrm{new}})^{*}$ . 2. (2)

Define a $B\cup\mathcal{X}\cup\mathcal{X}_{\mathrm{new}}$ -morphism

[TABLE]

by $h[r,s,\lambda]=rsr$ and $h[q]=q.$ We have $W=h_{1}(W^{\prime})$ and we obtain a compression transition satisfying the forward property

[TABLE]

Note that $\left\|\mathinner{E^{\prime}}\right\|<\left\|\mathinner{E}\right\|$ since $\left\|\mathinner{W^{\prime}}\right\|<\left\|\mathinner{W}\right\|$ due to the fact that (by our assumption) at least one green interval exists with a visible position exists; and therefore some new letter $[r,s,\lambda]$ is visible in $W^{\prime}$ (which represents the word $rsr$ of length at least $2$ ). The new entire solution at $E^{\prime}$ is $(\alpha^{\prime},\sigma^{\prime})=(\alpha h_{1},\sigma^{\prime})$ where $\sigma^{\prime}(X)=X$ for $X\in\mathcal{X}$ and $\sigma^{\prime}[X,p]=[p]^{e}$ if $\sigma([X,p])=p^{e}$ . We apply the meta rules and then we rename $E^{\prime},W^{\prime},\alpha^{\prime},\sigma^{\prime}$ as $E,W,\alpha^{\prime},\sigma^{\prime}$ but we keep the notation for $B^{\prime}$ , $\mathcal{X}$ , $\mathcal{X}_{\mathrm{new}}$ , $\theta^{\prime}$ , and $\mu^{\prime}$ (although $B^{\prime}$ , $\mathcal{X}_{\mathrm{new}}$ , and $\theta^{\prime}$ may become smaller by the meta rules and $\mu^{\prime}$ changes). 3. (3)

while there is letter $[p]\in B_{\mathrm{new}}$ do

(a)

If $\mathcal{X}_{\mathrm{new}}\neq\emptyset$ , then choose some $[X,p]\in\mathcal{X}_{\mathrm{new}}$ . Use a substitution transition defined by $\tau[X,p]=[X,p][p]^{2}$ to make sure that $\sigma([X,p])$ is shorter than at the beginning of the loop and that we don’t run out of letters $[p]$ as long as there are typed variables. The invariant is that as long as $\mathcal{X}_{\mathrm{new}}\neq\emptyset$ there is some letter $[p]$ visible. 2. (b)

Use transitions of the form $[X,p]\mapsto[X,p][p]$ in order to keep the invariant that $\sigma([X,p])=[p]^{e}$ where $e$ is even. Moreover, due the meta rules we maintain $|\sigma([X,p])|\geqslant{30}\delta$ . At some point $|\sigma([X,p])|$ might be too short, then we remove $[X,p]$ from $\mathcal{X}_{\mathrm{new}}$ . We also maintain the invariant that $|\sigma([X,p])|=|\sigma([X,q])|=|\sigma([\overline{X},\overline{q}])|$ for all $p,q$ and $X\in\mathcal{X}$ . Thus, if we remove one $[X,p]$ , then all other typed variables using the symbol $X$ are removed simultaneously and $\theta^{\prime}$ becomes smaller, too. 3. (c)

If there is a maximal green factor

[TABLE]

where $d\geqslant 0$ and $e$ is odd, then define an endomorphism

[TABLE]

by $h_{\lambda}([r,s,\lambda])=[p][r,s,\lambda]$ . Thus, we can write

[TABLE]

This defines a new equation $W^{\prime}$ and a $B^{\prime}$ -morphism $\sigma^{\prime}$ such that $h(W^{\prime})=W$ and $\sigma h_{\lambda}(W^{\prime})=\sigma^{\prime}(W)$ . Hence, there is a compression transition satisfying the forward property

[TABLE]

As above, $\left\|\mathinner{E^{\prime}}\right\|<\left\|\mathinner{E}\right\|$ since $\left\|\mathinner{W^{\prime}}\right\|<\left\|\mathinner{W}\right\|$ . The new entire solution at $E^{\prime}$ is $(\alpha^{\prime},\sigma^{\prime})=(\alpha h_{2},\sigma^{\prime})$ . We apply the meta rules and then we rename $E^{\prime},W^{\prime},\alpha^{\prime},\sigma^{\prime}$ as $E,W,\alpha^{\prime},\sigma^{\prime}$ . 4. (d)

Due to the previous steps: whenever we see a maximal green factor

[TABLE]

then $\sigma[X_{i},p]\in([p][p])^{*}$ for $1\leqslant i\leqslant d$ and $e$ is even. Define an endomorphism

[TABLE]

by $h_{3}([p])=[p]^{2}$ for all $[p]$ which appear in $B_{\mathrm{new}}$ . Thus, we can write

[TABLE]

This defines a new equation $W^{\prime}$ and a $B^{\prime}$ -morphism $\sigma^{\prime}$ such that $h(W^{\prime})=W$ and $\sigma h(W^{\prime})=\sigma^{\prime}(W)$ . Hence, there is a compression transition satisfying the forward property

[TABLE]

As above, $\left\|\mathinner{E^{\prime}}\right\|<\left\|\mathinner{E}\right\|$ since $\left\|\mathinner{W^{\prime}}\right\|<\left\|\mathinner{W}\right\|$ . The new entire solution at $E^{\prime}$ is $(\alpha^{\prime},\sigma^{\prime})=(\alpha h,\sigma^{\prime})$ . We apply the meta rules and then we rename $E^{\prime},W^{\prime},\alpha^{\prime},\sigma^{\prime}$ as $E,W,\alpha^{\prime},\sigma^{\prime}$ .

endwhile

endprocedure

It is clear that the procedure terminates in some standard state. Let us denote that state and its entire solution as:

[TABLE]

We began the routine at $E_{r}$ . During the procedure we see symbols $[X,p],[rs]$ and $[r,s,\lambda]$ , and the length of the equation $W$ grows as we pop out letters in the suffix and prefix of each variable. At the end all the new variables disappeared, either by the meta rules or when maximal green factors are compressed into a single letter $[r,s,\lambda]$ . The only new letters in $B_{s}$ are of the form $[r_{\lambda},s_{\lambda},\lambda]$ and there are not more than $|H|\cdot|W_{r}|$ of them.

The following proposition summarizes all the changes that happen in the procedure.

Proposition 10.3.

Let $E_{r}=(W_{r},B_{r},\mathcal{X}_{r},\emptyset,\mu_{r})$ be the state where we started $\delta$ -periodic-compression with an entire solution $(\alpha_{r},\sigma_{r})$ . Let $E_{s}=(W_{s},B_{s},\mathcal{X}_{s},\emptyset,\mu_{s})$ denote the standard state with the entire solution $(\alpha_{s},\sigma_{s})$ where we finish $\delta$ -periodic compression and let $(W,B^{\prime},\mathcal{X}^{\prime},\theta,\mu)$ be any state which we have seen (with the full set of variables $\mathcal{Y}^{\prime}=H\cdot\mathcal{X}^{\prime}$ ) on the path from $E_{r}$ to $E_{s}$ during the procedure.

Then we have the following.

(1)

$B_{s}=B\cup\left\{[r_{\lambda},s_{\lambda},\lambda]\mathrel{\left|\vphantom{[r_{\lambda},s_{\lambda},\lambda]}\vphantom{\lambda\in\Lambda}\right.}\lambda\in\Lambda\right\}$ * for some $B\subseteq B_{r}$ .* 2. (2)

$|B_{s}|\leqslant|H|\cdot(|W_{r}|+|W_{s}|)$ . 3. (3)

$|B^{\prime}|\in|H|\cdot(|W_{r}|+\mathcal{O}(\delta n))$ . 4. (4)

$\left\|\mathinner{W_{s}}\right\|\leqslant\left\|\mathinner{W_{r}}\right\|+14\delta n\leqslant\left\|\mathinner{W_{r}}\right\|+20\delta n$ . 5. (5)

$\left\|\mathinner{W}\right\|\in\left\|\mathinner{W_{r}}\right\|+\mathcal{O}(\delta n)\subseteq\left|\mathinner{W_{r}}\right|+\mathcal{O}(\delta n)$ . 6. (6)

$\mathcal{X}_{s}\subseteq\mathcal{X}_{r}\subseteq\mathcal{X}$ . 7. (7)

$\left|\mathinner{\mathcal{X}^{\prime}}\right|\in\mathcal{O}(\delta n)$ . 8. (8)

For each $X\in\mathcal{X}_{s}$ the word $\sigma_{s}(X)$ does not start or end with a very long $\delta$ -periodic word.

Proof.

We justify each item as follows.

(1)

Meta rules may remove some (useless) letters from the initial alphabet $B_{r}$ , so we have $B\subseteq B_{r}$ , and the only new constants that survive at the end are of the form $\left\{[r_{\lambda},s_{\lambda},\lambda]\mathrel{\left|\vphantom{[r_{\lambda},s_{\lambda},\lambda]}\vphantom{\lambda\in\Lambda}\right.}\lambda\in\Lambda\right\}$ . 2. (2)

$B_{s}$ consists of letters in $H\cdot B_{r}$ and those from $H\cdot\left\{[r_{\lambda},s_{\lambda},\lambda]\mathrel{\left|\vphantom{[r_{\lambda},s_{\lambda},\lambda]}\vphantom{\lambda\in\Lambda}\right.}\lambda\in\Lambda\right\}$ . Since we applied alphabet reduction at the beginning, $|B_{r}|\leqslant H\cdot|W_{r}|$ , and since the letters $[r_{\lambda},s_{\lambda},\lambda]$ cannot be eliminated by any compression during the procedure, their number is bounded by $H\cdot|W_{s}|$ . The only other new constants added during the procedure are of the form $[p]$ but these are all eliminated by compression, so do not appear in $B_{s}$ by the meta rule. 3. (3)

Again we have $|B_{r}|\leqslant H\cdot|W_{r}|$ . During the procedure we add letters $[rs]$ and $[r,s,\lambda]$ . For each variable $X\in\mathcal{X}$ we add some $[rs]$ and $[r,s,\lambda]$ , then we need to multiply by $\delta$ since we have all cyclic permutations of these and $|rs|\leqslant\delta$ . Since the are at most $n$ variables this gives $\mathcal{O}(\delta n)$ new constants, so applying the action we get the bound of $H\cdot\mathcal{O}(\delta n)$ new constants. 4. (4)

We first pop out $\tau(X)=X[X,p]p^{\prime}$ because we had $\sigma(X)=xq^{d}q^{\prime}=xup^{e}p^{\prime}$ with $|q^{d}q^{\prime}|\geqslant 3\delta$ . (If $|q^{d}q^{\prime}|<3\delta$ we do nothing.) After applying $\tau$ we have $\sigma(X)=xu$ and $\sigma([X,p])=p^{e}$ . If it is the case that $|p^{e}p^{\prime}|<7\delta$ then later we do not apply any compression to $[X,p]$ since it is not part of a very long factor, instead we simply pop it out. This contributes at most $14\delta n$ in the worst case that this happens in the suffix of every $X\in\mathcal{X}$ .

If $|p^{e}p^{\prime}|\geqslant 7\delta$ then together with $u$ this gives a factor of length at least $10\delta$ , so $[X,p]$ is part of a very long $\delta$ -periodic word so it is compressed down to a single letter. Thus $7\delta$ is the most added at either end of any variable by the procedure. This gives $14\delta n$ . We give the larger bound $20\delta n$ to simplify later calculations only. 5. (5)

Since $\left\|\mathinner{W}\right\|\leqslant\left|\mathinner{W}\right|+90\delta n\in\left|\mathinner{W}\right|+\mathcal{O}(\delta n)$ at every state in $\mathcal{T}$ , it is enough to show $\left\|\mathinner{W}\right\|\in\left\|\mathinner{W_{r}}\right\|+\mathcal{O}(\delta n)$ . However this only requires the estimation in Sect. 10.1. By that estimation we content ourselves to define a function $s\colon\mathbb{N}\to\mathbb{N}$ with $s(0)\leqslant c\delta n$ and which satisfies for all $t$ a bound

[TABLE]

for some $q<1$ and $c\geqslant 1$ . To see where the $q$ comes from in our application, choose $s(0)$ to be number of letters $[p]$ at the state where they first appear. Each time we pass through a transition defined by $h([p])=[p][p]$ we half the number of these letters; and this shows that we can define $q=1/2$ . Between these steps where we halve the number of $[p]$ ’s we create at most $c\delta n$ new ones with $c\in\mathcal{O}(1)$ . 6. (6)

During the procedure we add new variables $[X,p]$ but these are eliminated by the compression. Since we apply meta rules we may also remove variables $X\in\mathcal{X}_{r}$ . Thus $\mathcal{X}_{s}\subseteq\mathcal{X}_{r}\subseteq\mathcal{X}$ . 7. (7)

This is justified above at (20). 8. (8)

Consider any $\sigma_{s}(X)$ with $X\in\mathcal{X}_{s}$ . If for that $X$ , the word $\sigma(X)$ at the beginning at the procedure “insert new variables” had a $\delta$ -periodic suffix of length more than $3\delta$ , then, due to the splitting of variables, the length of the maximal $\delta$ -periodic suffix in $\sigma_{s}(X)$ is exactly $3\delta$ . Hence, there is no very long $\delta$ -periodic suffix. In the other case the suffix of length $3\delta$ in $\sigma(X)$ and $\sigma_{s}(X)$ coincide. Thus, in this case the length of the maximal $\delta$ -periodic suffix in $\sigma_{s}(X)$ is at most $3\delta$ . Since $X\in\mathcal{X}$ implies $\overline{X}\in\mathcal{X}$ the same is true for the prefix of each variable.

The proposition is therefore shown. ∎

10.6. The end of the $\delta$ -periodic compression

Recall that we started a compression round at a standard state $E_{r}$ with an equation $\left|\mathinner{W_{r}}\right|$ ; and we end standard state $E_{s}$ with an equation $\left|\mathinner{W_{s}}\right|$ . (Possibly $r=s$ .) If, due the meta rules, $E_{s}$ is final, we are done. Hence, we continue under the assumption that $E_{s}$ is not a final state.

11. Pair compression

We enter the second phase of the compression round with “pair compression” directly after the end of $\delta$ -periodic compression. We enter the pair compression procedure at a standard state $E_{s}$ which is not final and with an entire solution $(\alpha_{s},\sigma_{s})$ . If $E_{r}$ and $(\alpha_{r},\sigma_{r})$ denote the situation where we began the current compression round (where we began $\delta$ -periodic compression), then Prop. 10.3 tells us:

[TABLE]

During the pair compression all states are standard states. No type is needed. Phrased differently, we have $\theta=\emptyset$ , there are no typed variables and hence, all variables belong to $\mathcal{X}$ . Thus, the number of positions labeled with twisted variables is at most $n$ , as it is required for standard states.

Our goal in this section is to compress pairs $ab\leqslant W_{s}$ of constants into single letters without causing any conflict due to overlap with other pairs or variables that are connected via twisted equations. In particular, compressing a pair linked to a $\delta$ -periodic factor might cause problems, so we wish to avoid compressing those pairs. This leads us to define the following.

Definition 11.1.

Let $E=(W,B,\mathcal{X},\emptyset,\mu)$ be a standard state with a solution $\sigma$ . We say that $(E,\sigma)$ satisfies the shrinking pair condition if there is no $X\in\mathcal{X}$ such the word $\sigma(X)$ starts with a very long $\delta$ -periodic word444Recall that a word $w$ is very long $\delta$ -periodic if and only if $\left|\mathinner{w}\right|\geqslant 10\delta$ and $w$ is a prefix of some word $p^{|w|}$ where $\left|\mathinner{p}\right|\leqslant\delta$ , see Def. 6.3..

Note this is the situation we find ourselves in at the conclusion of the $\delta$ -periodic compression procedure: $E_{s}=(W_{s},B_{s},\mathcal{X}_{s},\emptyset,\mu_{s})$ with its solution $\sigma_{s}$ satisfies Def. 11.1. The shrinking pair condition is a necessary condition when proving Lem. 11.3 below. That technical lemma is one of the key steps.

11.1. Positions revisited

Consider any standard state $E=(W,B,\mathcal{X},\emptyset,\mu)$ together with a entire solution $(\alpha,\sigma)$ . We introduce a precise notion of equivalence $\equiv$ between positions (and intervals) of $\sigma(W)$ , which we introduce now. The idea is that whenever we modify a solution $\sigma$ at a position $i$ , then we must modify $\sigma$ at all equivalent positions $j\equiv i$ in order to keep the property of being a solution. Moreover, $\equiv$ should be the finest equivalence relation with that property. For example, when we compress a factor $ab$ where $a$ , $b$ are letters, then we want to compress only certain occurrences of $f(ab),f(\overline{b}\overline{a})$ (where $f\in H$ ) and not all of them.

As explained in Sect. 10.3 there is a canonical mapping from the set $[1,\left|\mathinner{\sigma(W)}\right|]$ to the set of positions in $W$ . By $\lambda(i)$ we denote the label of a position $i$ in $\sigma(W)$ . Recall also the notion of duality: if $[l,r]$ is an interval in $[1,\left|\mathinner{\sigma(W)}\right|]$ , then $\overline{[\ell,r]}=[\overline{r},\,\overline{\ell}]$ denotes the dual interval, where $\overline{i}=|\sigma(W)|+1-i$ for all $1\leqslant i\leqslant|\sigma(W)|$ . According to the definition of standard states, the set of twisted variables which appear in $W$ can be written as $\mathcal{Y}=(H\times\mathcal{X})$ and we have $\mathcal{X}\subseteq\mathcal{X}$ . It is convenient to fix a subinterval $I(X)$ of $[1,\left|\mathinner{\sigma(W)}\right|]$ for each $X\in\mathcal{X}$ such that $\overline{I(X)}=I(\overline{X})$ as follows. Consider $\left\{\mathinner{X,\overline{X}}\right\}\subseteq\mathcal{X}$ , then without restriction we have $X=X_{i}$ for some unique $1\leqslant i\leqslant|\mathcal{X}|/2$ (and hence $\overline{X}=\overline{X_{i}}$ ). Choose for $I(X)$ the left-most maximal interval $[\ell_{X},r_{X}]$ in $\sigma(W)$ which is mapped to a position labeled by $X_{i}$ . In particular, $r_{X}-\ell_{X}+1=\left|\mathinner{\sigma(X}\right|$ . We let $\overline{I(X)}=I(\overline{X})$ . By the specific structure of $W$ being an extended equation, we see that $\overline{I(X)}$ is the right-most maximal interval in $\sigma(W)$ which is mapped to a position labeled by $\overline{X_{i}}$ . To have a notation we let $2m=\left|\mathinner{\sigma(W)}\right|$ and

[TABLE]

Note that $I(\mathcal{X})$ and $I(B)$ are disjoint sets of positions: if $i\in I(\mathcal{X})$ , then there is some $X$ such that $i$ is mapped to a position labeled by $X$ and $\overline{i}$ is mapped to a position labeled by $\overline{X}$ . The next idea is to identify positions in $\sigma(W)$ based on the fact that we can write $W$ in the form $W=\#U\#\,\#\overline{V}\#$ such that $\sigma(W)=\sigma(\overline{W})\iff\sigma(U)=\sigma(V)$ . In pictures we intend to place the positions of $\sigma(U)$ and $\sigma(V)$ on top of each other. The intuition is clear: we have $[1,2m]=\left\{\mathinner{1,\ldots,m,\,\overline{m},\ldots,\overline{1}}\right\}$ . The positions of $\#\overline{V}\#$ cover $\overline{m},\ldots,\overline{1}$ . Hence, we can think that $\#V\#=\overline{\#\overline{V}\#}$ uses the same set of positions as $\#U\#$ , namely the set $\left\{\mathinner{1,\ldots,m}\right\}$ . Thus, every $i\in[1,2m]$ has always two interpretations: for $i\leqslant m$ as a position either in $\#U\#$ or in $\#V\#$ , for $m<i$ the situation is dual. Let us make this intuition formal.

The mapping from $[1,2m]$ to the positions of $W$ induces a relation

[TABLE]

We define ${\leadsto}$ inductively. If $i$ is mapped to a position in $W$ which labeled by a constant, then we have $i\in I(B)$ and we let $i\leadsto i$ and $\overline{i}\leadsto\overline{i}$ . In the other case $i$ is mapped to a position labeled by $Y=(f,X)$ for some $f\in H$ and $X\in\mathcal{X}$ . Let $\ell$ the leftmost position in $[1,2m]$ which is mapped to the same position as $i$ . Then we can write $i=\ell+k$ and we find $j=\ell_{X}+k$ . In this case we let $i\leadsto j$ and $\overline{i}\leadsto\overline{j}$ . Note that this implies $\lambda(i)=f(\lambda(j))$ . The position $\overline{j}$ belongs to $I(\overline{X})$ .

Up to duality, there are three cases:

(1)

$i\in I(B)$ and $\overline{i}\in I(B)$ . Then there is only one $j$ such that $i\leadsto j$ . 2. (2)

$i\in I(B)$ and $\overline{i}\notin I(B)$ . Then we have $i\leadsto i\in I(B)$ and $i\leadsto j\in I(\mathcal{X})$ . Since $I(\mathcal{X})\cap I(B)=\emptyset$ , we have $i\neq j$ . 3. (3)

$i\notin I(B)$ and $\overline{i}\notin I(B)$ . Then $i\leadsto j$ and $\overline{i}\leadsto\overline{k}$ with $\left\{\mathinner{j,\overline{j},k,\overline{k}}\right\}\subseteq I(\mathcal{X})$ . Hence, there there are at most two $j,k\in I(\mathcal{X})$ such that $i\leadsto j$ and $i\mathrel{\mathchoice{\reflectbox{$ \displaystyle\leadsto $}}{\reflectbox{$ \textstyle\leadsto $}}{\reflectbox{$ \scriptstyle\leadsto $}}{\reflectbox{$ \scriptscriptstyle\leadsto $}}}k$ .

Let us explain the meaning of this “ $\leadsto$ ” relation by considering a local equation $u(f,X)w(g,\overline{Y})v=uZv$ . In $W$ this local equation corresponds to some factorization

[TABLE]

Let $\ell=\left|\mathinner{\sigma(u_{1}\#)}\right|+1$ and $r=\left|\mathinner{\sigma(XwYv}\right|-1$ . Then the interval $\sigma(W)[\ell,r]$ is labeled by $u\sigma(f,X)w(g,\overline{Y})v$ ; and, since $\sigma$ is a solution, we have $u\sigma(f,X)w(g,\overline{Y})v=u\sigma(Z)v$ . For each $i\in[\ell,r]$ we have $i\leadsto j$ where either $i=j$ or $j\in I(X)\cup I(\overline{Y})$ . If $j\in I(X)\cup I(\overline{Y})$ , then $\overline{i}\leadsto\overline{k}\in I(\overline{Z})$ and therefore $i\leadsto k\in I(Z)$ , too. The positions of $u,w,v$ are visible in $W$ , but with respect to $\overline{W}$ this true only for the positions of $u$ and $v$ . For the positions of $u$ and $v$ the relation $\leadsto$ is the identity. The relations are depicted in Fig. 5: the relation ${\leadsto}$ is given by the positions in the middle row to the top and bottom row. Let $j\mathrel{\mathchoice{\reflectbox{$ \displaystyle\leadsto $}}{\reflectbox{$ \textstyle\leadsto $}}{\reflectbox{$ \scriptstyle\leadsto $}}{\reflectbox{$ \scriptscriptstyle\leadsto $}}}i$ denote $i\leadsto j$ . If $j\mathrel{\mathchoice{\reflectbox{$ \displaystyle\leadsto $}}{\reflectbox{$ \textstyle\leadsto $}}{\reflectbox{$ \scriptstyle\leadsto $}}{\reflectbox{$ \scriptscriptstyle\leadsto $}}}i\leadsto k$ , then we write $j\sim k$ .

Consider any $i\in[1,2m]$ and $j,k\in I(\mathcal{X})\cup I(B)$ such that $j\mathrel{\mathchoice{\reflectbox{$ \displaystyle\leadsto $}}{\reflectbox{$ \textstyle\leadsto $}}{\reflectbox{$ \scriptstyle\leadsto $}}{\reflectbox{$ \scriptscriptstyle\leadsto $}}}i\leadsto k$ . (The interesting case is $j\neq k$ .) Hence, we have $j\sim k$ and in the pictures we put the positions $j$ and $k$ on top of each other. Fig. 5 gives an example, where $32\sim 10,\ldots,35\sim 13$ , $98\sim 14,99\sim 15$ , and $\overline{45}\sim 16,\ldots,\overline{42}\sim 19$ .

Since $i\leadsto j\iff\overline{i}\leadsto\overline{j}$ we have

[TABLE]

Moreover, $j\sim k$ implies that there are $f,g\in H$ such that $f(\lambda(j))=\lambda(i)=g(\lambda(k))$ . Thus, $j\sim k$ ensures $\lambda(j)\in H\cdot\lambda(k).$

We have $I(\mathcal{X})\cup I(B)\subseteq[1,2m]$ and ${\sim}\subseteq[1,2m]\times[1,2m]$ is a symmetric relation. It is also reflexive on $I(\mathcal{X})\cup I(B)\times I(\mathcal{X})\cup I(B)$ .

By $\approx$ we denote the reflexive and transitive closure of $\sim$ . However, the relation $\approx$ is too fine, in general. Since $i\sim j$ implies $\lambda(i)\in H\cdot\lambda(j)$ , we cannot expect that $i\approx\overline{i}$ , because $\lambda(\overline{i})=\overline{\lambda(i)}$ is typically not in $H\cdot\lambda(j)$ . Clearly, if we intend to change the label at position $i$ from, say, $a$ to $c$ , then we must change the label at position $\overline{i}$ from $\overline{a}$ to $\overline{c}$ .

In the following, we write $i\leftrightarrow j$ if $j=\overline{i}$ and we define $\equiv$ to be the equivalence relation over $[1,2m]$ which is generated by ${\sim}\cup{\leftrightarrow}$ . We have $\approx\subseteq\equiv$ , but we have just seen that these relations are different, in general. Since $i\sim j\iff\overline{i}\sim\overline{j}$ by (40), we have

[TABLE]

Hence,

[TABLE]

We extend the notation above to intervals. Let $i$ , $j$ be positions in $\sigma(W)$ such that $i\leadsto j$ , and let ${p}\in\mathbb{N}$ . Assume (by symmetry) that we have $i\leadsto j\in I(X)$ due to mapping the position $i$ of $\sigma(W)$ to a position $q$ in $W$ which is labeled by a twisted variable $(f,X)$ . If the position $i+p$ is also mapped to the same position $q$ , then all positions in the interval $[i,i+{p}]$ are mapped to $q$ . We then write

[TABLE]

As above we now define the relation $\sim$ on intervals. Again, we let $\approx$ be the generated equivalence relation, now on intervals. Finally, we relate an interval $[l,r]$ with $1\leqslant l\leqslant r\leqslant 2m$ to the interval $[\overline{r},\,\overline{l}]$ via

[TABLE]

Thus, we also extend $\equiv$ to the equivalence relation on intervals which is generated by $\sim$ and $\leftrightarrow$ . Having this, the general form of (41) becomes

[TABLE]

In the following we say that two positions (or intervals) are equivalent if they are related by $\equiv$ .

Lemma 11.2.

Let ${t}\geqslant 1$ and $\sigma$ be a solution at a standard state $E=(W,B,\mathcal{X},\emptyset,\mu)$ such that $|\sigma(X)|\geqslant 2{t}$ for all $X\in\mathcal{X}$ . For each $X\in\mathcal{X}$ write $\sigma(X)=uxv$ with $|u|=|v|={t}$ . Let $E^{\prime}=(W^{\prime},B,\mathcal{X}^{\prime},\emptyset,\mu^{\prime})$ and $\sigma^{\prime}$ denote state and solution which we obtain by following a substitution transition defined by $\tau(X)=uXv$ for all $X$ . Let $I^{\prime}(\mathcal{X}^{\prime})$ be the set of positions according to (38) and $\approx^{\prime}$ the equivalence relation on intervals with respect to $\sigma^{\prime}$ . Let $i$ and $j$ positions in $I^{\prime}(\mathcal{X}^{\prime})$ such that $i\approx^{\prime}j$ , then we have $[i-{t},i+{t}]\approx[j-{t},j+{t}]$ with respect to $E$ and $\sigma$ .

Proof.

Let $X\in\mathcal{X}^{\prime}$ . Then $X\in\mathcal{X}$ and if $I(X)=[\ell_{X},r_{X}]$ , then the corresponding interval $I^{\prime}(X)$ with respect to $\sigma^{\prime}$ is $I^{\prime}(X)=[\ell_{X}+{t},r_{X}-{t}]$ . Thus, if $i\approx^{\prime}j$ with $i\in I^{\prime}(X)$ and $j\in I^{\prime}(Y)$ , then we obtain a “domino tower” as depicted in Fig. 6. The intervals with respect to $I^{\prime}(\mathcal{X}^{\prime})$ are the white blocks and $i$ and $j$ are arranged such that $i$ sits along a vertical column above $j$ . We obtain the corresponding tower with respect to $I(\mathcal{X})$ by adding the grey borders of length $p$ at each side in order to switch from $I^{\prime}(Z)$ to $I(Z)$ . Thus, $i\approx j$ with respect to $W^{\prime}$ implies $[i-{t},i+{t}]\approx[j-{t},j+{t}]$ . ∎

The distance $d(i,j)$ between positions of $\sigma(W)$ is denoted as usual: $d(i,j)=\left|\mathinner{j-i}\right|$ . We continue with the same the notation as in Lem. 11.2. We apply the lemma with ${t}=10\left|\mathinner{H}\right|\varepsilon$ . Recall that we fixed the parameters such that $\varepsilon=30n$ and $\delta=\left|\mathinner{H}\right|\varepsilon$ . Hence, ${t}=10\delta$ .

Lemma 11.3.

Let $t=10\delta$ and suppose that $E=(W,B,\mathcal{X},\emptyset,\mu)$ satisfies the shrinking pair condition (Def. 11.1) with respect to $\sigma$ . Let ${Z}\in\mathcal{X}^{\prime}$ be a variable with $I^{\prime}({Z})=[l,r]$ (and hence, $I({Z})=[l-10\delta,\,r+10\delta]$ ); and let $l\leqslant i<j\leqslant r$ such that $i\approx^{\prime}j$ with respect to $\sigma^{\prime}$ . Then we have $d(i,j)>\varepsilon$ .

Proof.

According to Lem. 11.2 we have $[i-{t},i+{t}]\approx[j-{t},j+{t}]$ with respect to $\sigma$ . Next, we choose ${k}\in\mathbb{N}$ as large as possible so that $[i-{k},i+{t}]$ is a subinterval of $I({Z})$ and $[i-{k},i+{t}]\approx[j-{k},j+{t}]$ with respect to $\sigma$ . By induction on the number of steps using the relation $\sim$ , this implies that there is some interval $[{\ell},{\ell}+{t}+{k}]\subseteq I(X^{\prime})$ such that both, first

[TABLE]

and second, ${\ell}$ is the first position in $I(X^{\prime})$ , see Fig. 7. Assume, by contradiction, $d(i,j)\leqslant\varepsilon$ . Then, $\sigma(W)[i-{k},i+{t}]$ and $\sigma(W)[j-{k},j+{t}]$ are twisted conjugate with a positive offset $d(i,j)$ which is at most $\varepsilon$ . This implies that $\sigma(W)[i-{k},i+{t}]$ is a very long $\delta$ -periodic word by Cor. 6.2. Therefore, the prefix $\sigma(W)[\ell,\ell+{t}+{k}]$ of $\sigma(X^{\prime})$ is a very long $\delta$ -periodic word. This contradicts the hypothesis that $E$ satisfies the shrinking pair condition. ∎

11.2. Red positions

We use the notation of Sect. 11.1. Let $[l,r]$ be a maximal interval in $\sigma(W)$ which is mapped to a position in $W$ which is labeled by some $(f,X)$ where $f\in H$ and $X\in\mathcal{X}$ . Thus, the factor $\sigma(W)[l,r]$ is equal to the word $\sigma(X)$ . The positions $l$ and $r$ at the borders of $[l,r]$ play a special role because if there is a factor $ab=\sigma(W)[l-1,l]$ , then we cannot compress $ab$ because $ab$ is “crossing”: compressing $ab$ might be “dangerous”. In order to signal “danger” we color the first and the last position in each interval $I(X)$ red. Moreover, whenever $i\equiv j$ holds in $\sigma(W)$ , then color $j$ red, too. For example, if we have a situation as depicted in Fig. 8, then the red color at the last position of $I(X)$ and the red color at the first position of $I(Z)$ yields two red columns. Note that the first red position in $I(X)$ and the last red position in in $I(\overline{X})$ are equivalent: these are dual positions. For convenience, we also color all positions in $\sigma(W)$ red which are labeled by the marker symbol $\#$ . If the label of $i$ is $\#$ , then $i\equiv j\iff j\in\left\{\mathinner{i,\overline{i}}\right\}$ .

Since $i\equiv\overline{i}$ for all positions, it follows that there are at most $n$ pairwise different equivalence classes of red positions. This counting will be used later.

Consider an interval of length two $I=[i,i+1]$ without red position. The idea is to compress $I$ into a single position. The problem is overlapping: we might have $[i-1,i]\equiv[i,i+1]$ or $[i,i+1]\equiv[i+1,i+2]$ . Note that $[i-1,i]\equiv[i,i+1]$ implies $i-1\equiv i$ or $i-1\equiv i+1$ . Similarly, $[i,i+1]\equiv[i+1,i+2]$ implies $i\equiv i+1$ or $i\equiv i+2$ . Therefore, we start with intervals of length $4$ where all four positions are inequivalent: This enables us to compress the middle interval of length $2$ . We shall use the following lemma.

Lemma 11.4.

Let $[i-1,i,i+1,i+2]$ be an interval of length $4$ without any red position and where the positions are pairwise inequivalent. Consider $[i,i+1]\equiv[j,j+1]\equiv[k,k+1]$ . Then there are two cases:

(1)

$[j,j+1]\cap[k,k+1]=\emptyset$ , 2. (2)

$k=j$ * and $[j,j+1]\not\approx[\overline{k+1},\overline{k}]$ .*

Proof.

Notice that each of the intervals $[i-1,i,i+1,i+2],[j-1,j,j+1,j+2],[k-1,k,k+1,k+2]$ is without red positions. We may assume that $[j,j+1]\cap[k,k+1]\neq\emptyset$ because otherwise we are done.

First, let $j=k$ . By contradiction assume $[j,j+1]\approx[\overline{k+1},\overline{k}]$ . Then $j+1\approx\overline{k}\leftrightarrow k$ which implies $k=j\equiv j+1$ . Since $[i,i+1]\equiv[j,j+1]$ we obtain $i\equiv i+1$ . This was excluded.

In the second case we have $j\neq k$ . Let us show that $j\neq k$ and $[j,j+1]\cap[k,k+1]\neq\emptyset$ leads again to a contradiction. Since $j\neq k$ we cannot have $[j,j+1]=[k,k+1]$ . Hence $j+1=k$ or $k+1=j$ . By symmetry in $j$ and $k$ , we may assume $j+1=k$ .

We cannot have $[j,j+1]\approx[k,k+1]$ because then $j\equiv k$ , but $k=j+1$ , and hence, $j\equiv j+1$ . This is impossible. Thus, $[j,j+1]\approx[\overline{k+1},\overline{k}]$ and $j\equiv k+1=j+2$ . We remember $j\equiv j+2$ . If $[i,i+1]\approx[j,j+1]$ , then (as no position is red) $[i,i+1,i+2]\approx[j,j+1,j+2]$ implies $i\approx i+2$ . This is impossible. Hence, the last option is $[\overline{j}-1,\overline{j}]\approx[i,i+1]$ and $[\overline{j}-2,\overline{j}-1,\overline{j}]\approx[i-1,i,i+1]$ . However, $j\equiv j+2$ implies $\overline{j}\equiv\overline{j}-2$ . We have again a contradiction as $i-1\not\equiv i+1$ . ∎

Ex. 11.5 indicates why the assertion in Lem. 11.4 only holds in the middle interval $[i,i+1]$ of $[i-1,i,i+1,i+2]$ , in general.

Example 11.5.

We don’t exclude that $H$ acts with involution. Thus, there might be an $a\in B$ and $f\in H$ such that $f(a)=\overline{a}$ . Consider the equation $\overline{X}=(f,X)$ with the solution $\sigma(X)=dcba\overline{b}\overline{c}\overline{d}$ and where $f(x)=x$ for $x=b,c,d$ . Then we have

[TABLE]

The positions of $\sigma(X)$ can be identified with $\left\{\mathinner{1,\ldots,7}\right\}$ with $i\equiv 8-i$ for all positions $1\leqslant i\leqslant 7$ . Since positions $3$ and $5$ are equivalent, the interval $[2,5]$ contains equivalent positions. The four positions in the interval $[1,4]$ are pairwise inequivalent. However, $[3,4]$ intersects with $[4,5]=[4,\overline{3}]$ . Thus, later on we cannot compress the interval $[3,4]$ corresponding to the pair $ba$ . On the other hand, there is no obstacle to compress the interval $[2,3]$ which is labeled by $cb$ .

Lemma 11.6.

Let $\sigma$ be a solution at a standard state $E=(W,B,\mathcal{X},\emptyset,\mu)$ and $I=[p,p+9]$ be an interval of length $10$ in $\sigma(W)$ without any red position such that $i\approx j$ implies $i=j$ for all $i,j\in I$ . Then $I$ contains a subinterval $Q$ of length $4$ where all positions are pairwise inequivalent.

Proof.

For simplicity of notation let $I=[1,10]$ . If all positions in $[4,5,6,7]$ are pairwise inequivalent, we are done. In particular, there are $1\leqslant i<j\leqslant 10$ such that $i\equiv j$ . Since, $i\not\approx j$ this implies $i\approx\overline{j}$ by (41). This in turn means that we cannot have $i\equiv j\equiv k$ where $1\leqslant i<j<k\leqslant 10$ because this would lead to $i\approx k$ via $i\approx\overline{j}\approx k$ . We say that $j$ is the partner of $i$ if $i\neq j$ but $i\equiv j$ . We conclude that every $i\in I$ has at most one partner.

We know $4+i\equiv 4+j$ for some $0\leqslant i<j\leqslant 3$ . Hence $4+i\approx\overline{4+j}$ . Let $1\leqslant k\leqslant 3$ . Since $I$ is without red positions, this implies

[TABLE]

This means that every position $q\in Q=\left\{\mathinner{4+i-3,4+i-2,4+i-1,4+i}\right\}$ has one partner in $P=\left\{\mathinner{4+j+3,4+j+2,4+j+1,4+j}\right\}$ . Since $Q\cap P=\emptyset$ and $Q\cup P\subseteq I$ , we are done: in $Q$ all four positions are pairwise inequivalent. ∎

Definition 11.7.

We say that an interval $I=[i,i+1]$ in $\sigma(W)$ is good if the following conditions hold.

•

Neither $i$ nor $i+1$ is red.

•

The positions $i$ and $i+1$ are visible,

•

Whenever $[i,i+1]\equiv J\equiv K$ , then either $J=K$ or $J\cap K=\emptyset$ .

Remark 11.8.

The definition of a good interval $I=[i,i+1]$ excludes $i\approx i+1$ . Indeed, since neither $i$ nor $i+1$ is red, $i\approx i+1$ implies $[i-1,i,i+1]\approx[i,i+1,i+2]$ , hence an overlap $[i,i+1]\approx[i+1,i+2]$ . However, $I\approx[\overline{i+1},\overline{i}]$ is allowed. Hence, it may happen that $i\equiv i+1$ . If such an $I$ is labeled with $ab$ , then there is some $f\in H$ with $f(ab)=\overline{b}\overline{a}$ . Hence, $f(a)=\overline{b}$ and $f(b)=\overline{a}$ . We deduce that we obtain a consistent labeling for

[TABLE]

Thus, we may compress $[i,i+1]$ into a single position with label $c$ and therefore, due to $I\approx[\overline{i+1},\overline{i}]$ , we must also compress $[\overline{i+1},\overline{i}]$ into a single position with label $f(c)$ ; and due to $[\overline{i+1},\overline{i}]\leftrightarrow[i,i+1]$ , we have to compress $[\overline{i+1},\overline{i}]$ into a single position with label $\overline{c}$ . But this is fine, we have $h(c)=ab$ , $h(\overline{c})=\overline{b}\,\overline{a}$ , and $f(c)=\overline{c}$ .

11.3. The procedure

Recall that we have fixed $\delta=\left|\mathinner{H}\right|\varepsilon$ and $\varepsilon=30n$ . Thus, $\delta\in\mathcal{O}(\left|\mathinner{H}\right|n)$ and $\varepsilon\in\mathcal{O}(n)$ . We start at a standard state $E=(W,B,\mathcal{X},\emptyset,\mu)$ together with an entire solution $(\alpha,\sigma)$ where none of the meta rules apply. In particular, $\left|\mathinner{\sigma(X)}\right|\geqslant 30\delta$ and $\mathcal{X}\subseteq\mathcal{X}$ with $\mathcal{Y}=H\times\mathcal{X}$ . Hence, by definition:

[TABLE]

All local equations have the form $u(f,X)w(g,Y)v=uZv$ . (As before dummy variables are allowed.) We define the equivalence relations $\approx$ and $\equiv$ over the set of positions of $\sigma(W)$ as defined in Sect. 11.1. Let $E$ be a standard state with equation $W$ and entire solution $(\alpha,\sigma)$ . Once, we found a good interval $I$ in $\sigma(W)$ , we may call the following procedure for that interval.

begin procedure (compress a good interval $I$ )

(1)

Let $a,b\in B$ and let $ab$ be the label of the good interval $I=[i,i+1]$ . Choose a fresh letter $c$ with stabilizer $H_{c}=H_{a}\cap H_{b}$ ; and define a $B$ -morphism from $B^{\prime}=B\cup\left\{f(c),f(\overline{c})\mathrel{\left|\vphantom{f(c),f(\overline{c})}\vphantom{f\in H}\right.}f\in H\right\}$ to $B^{2}$ by $h(c)=ab$ . Whenever $[i,i+1]\approx[j,j+1]$ , then the label of $[j,j+1]$ is $f(ab)$ for some $f\in H$ . Replace each of the intervals $[j,j+1]$ (resp. $[\overline{j+1},\overline{j}]$ ) by a single new position and label this position with $f(c)$ (resp. $f(\overline{c})$ ). (There is no conflict in this relabeling, see Rem. 11.8.) Since there is no red position in $[j,j+1]$ and $[\overline{j-1},\overline{j}]$ , none of the intervals $[j,j+1]$ or $[\overline{j-1},\overline{j}]$ is “crossing”. So, this gives a new but shorter equation $W^{\prime}$ . We have $h(W^{\prime})=W$ and new solution $\sigma^{\prime}$ such that $h\sigma^{\prime}(W^{\prime})=\sigma(W)$ . 2. (2)

Follow the corresponding compression transition

[TABLE]

We have a new state $E^{\prime}$ with an entire solution $(\alpha^{\prime},\sigma^{\prime})=(\alpha h,\sigma^{\prime})$ . There is also new numbering for the positions, but the red positions can still be identified.

endprocedure

We are now ready to define the procedure “pair compression” which uses “compress a good interval” as a subroutine.

begin procedure (pair compression)

(1)

For every $X\in\mathcal{X}$ write $\sigma(X)=uxv$ with $\left|\mathinner{u}\right|=\left|\mathinner{v}\right|=10\delta$ . Follow a substitution transition x

[TABLE]

defined by the substitution $\tau(X)=uXv$ and $W^{\prime}=\tau(W)$ . This transition satisfies the forward property with the new entire solution $(\alpha,\sigma^{\prime})$ where $\sigma^{\prime}(X)=x$ for all $X\in\mathcal{X}$ . Recall that $\sigma(W)=\sigma^{\prime}(W^{\prime})$ .

After the preceding step, define the intervals $I^{\prime}(X)$ with respect to $\sigma^{\prime}$ as done in Sect. 11.1. Use the red color for the first and the last position in each $I^{\prime}(X)$ . Color in red all equivalent positions in $\sigma^{\prime}(W^{\prime})$ of red positions with respect to $\equiv^{\prime}$ , too. See Sect. 11.2. 2. (2)

Rename $E^{\prime},W^{\prime},\mu^{\prime},\sigma^{\prime},\approx^{\prime},\equiv^{\prime}$ as $E,W,\mu,\sigma,\approx,\equiv$ . 3. (3)

Define the alphabet $B_{\mathrm{old}}=B$ . During the following loop we keep the invariant $B_{\mathrm{old}}\subseteq B$ . 4. (4)

while $\sigma(W)$ contains a good interval $I=[i,i+1]$ with a label in $B_{\mathrm{old}}^{2}$

do

(a)

Choose any good interval $I$ in $\sigma(W)$ . 2. (b)

Run the procedure “compress $I$ ”. 3. (c)

Rename $E^{\prime},B^{\prime},W^{\prime},\mu^{\prime},\alpha^{\prime},\sigma^{\prime},\approx^{\prime},\equiv^{\prime}$ as $E,B,W,\mu,\alpha,\sigma,\approx,\equiv$ and transfer the induced coloring of red positions.

endwhile 5. (5)

Perform an alphabet reduction at the standard state $E$ . 6. (6)

Rename $E,B,\mathcal{X},W,\mu,\alpha,\sigma$ as $E_{r^{\prime}},B_{r^{\prime}},\mathcal{X}_{r^{\prime}},W_{r^{\prime}},\mu_{r^{\prime}},\alpha_{r^{\prime}},\sigma_{r^{\prime}}$ .

endprocedure

Remark 11.9.

The procedure “pair compression” may not actually succeed in compressing any pair. Its first step always “pops out” letters to make the equation longer (by $20\delta$ ). After that if no pair is compressed, the procedure leaves the equation longer than before it was called. This is intentional: if the equation becomes long enough, then one of $\delta$ -periodic- or pair compression is guaranteed to reduce the equation size by a positive fraction.

11.4. The end of pair compression ends the compression round

We began the compression round at a standard state $E_{r}=(W_{r},B_{r},\mathcal{X}_{r},\emptyset,\mu_{r})$ with an entire solution $(\alpha_{r},\sigma_{r})$ . We ended the $\delta$ -periodic compression either by entering a final state or, in the other case, at a standard state $E_{s}=(W_{s},B_{s},\mathcal{X}_{s},\emptyset,\mu_{s})$ with an entire solution $(\alpha_{s},\sigma_{s})$ such that

[TABLE]

We started the pair compression at the standard state $E_{s}=(W_{s},B_{s},\mathcal{X}_{s},\emptyset,\mu_{s})$ with the entire solution $(\alpha_{s},\sigma_{s})$ . Compression took place only for good intervals which where labeled by words $ab$ with $a,b\in B_{\mathrm{old}}=B_{s}$ . Each compression reduced the length of the equation because a good interval consists of two visible positions. Thus, at most $\left|\mathinner{W_{s}}\right|$ compressions were possible; and this shows that we did not introduce more than $\left|\mathinner{H}\right|\cdot\left|\mathinner{W_{s}}\right|\in\left|\mathinner{H}\right|\cdot\left|\mathinner{W_{r}}\right|+\mathcal{O}(\delta n)$ fresh letters. Thus, every alphabet $B$ of constants we met during the entire round satisfied

[TABLE]

Now, let $E_{r^{\prime}}=(W_{r^{\prime}},B_{r^{\prime}},\mathcal{X}_{r^{\prime}},\emptyset,\mu_{r^{\prime}})$ denote the standard state with the entire solution $(\alpha_{r^{\prime}},\sigma_{r^{\prime}})$ where we end the procedure “pair compression”. In the very first step of the procedure we followed a substitution transition. This is enough to infer

[TABLE]

Proposition 11.10.

Let $E_{r}=(W_{r},B_{r},\mathcal{X}_{r},\emptyset,\mu_{r})$ be a standard state with an entire solution $(\alpha_{r},\sigma_{r})$ at the start of a compression round and $E_{r^{\prime}}=(W_{r^{\prime}},B_{r^{\prime}},\mathcal{X}_{r^{\prime}},\emptyset,\mu_{r^{\prime}})$ be the standard state where we end the round with the entire solution $(\alpha_{r^{\prime}},\sigma_{r^{\prime}})$ . Then we have

[TABLE]

Moreover, if $W$ is any equation which we see on the path from $E_{s}$ to $E_{r^{\prime}}$ , then we have $\left\|\mathinner{W}\right\|\leqslant\left\|\mathinner{W_{r}}\right\|+\mathcal{O}(\delta n)$ .

Proof.

Each compression round has two phases. The $\delta$ -periodic compression stops at a standard state $E_{s}=(W_{s},B_{s},\mathcal{X}_{s},\emptyset,\mu_{s})$ with an entire solution $(\alpha_{s},\sigma_{s})$ . By Prop. 10.3 $\left\|\mathinner{W_{s}}\right\|\leqslant\left\|\mathinner{W_{r}}\right\|+20\delta n$ and all intermediate equations $W$ satisfy $\left\|\mathinner{W}\right\|\leqslant\left\|\mathinner{W_{r}}\right\|+\mathcal{O}(\delta n)$ .

Now, let $W$ be any equation being on the path from $E_{s}$ to $E_{r^{\prime}}$ . The additional length for $W$ is due to the first step in pair compression when we substitute variable $X$ by $u_{X}Xv_{X}$ with $\left|\mathinner{u_{X}}\right|=\left|\mathinner{v_{X}}\right|=10\delta n$ . This shows $\left\|\mathinner{W}\right\|\leqslant\left\|\mathinner{W_{r}}\right\|+40\delta n$ . Moreover, $\left\|\mathinner{W_{r^{\prime}}}\right\|\in\left|\mathinner{W_{r}}\right|+\mathcal{O}(\delta n)$ and $\left|\mathinner{W_{s}}\right|\leqslant\left\|\mathinner{W_{s}}\right\|$ .

Thus, by Lem. 10.1 it suffices to prove

[TABLE]

Let us have a closer look at a local equation $u(f,X)w(g,Y)v=uZv$ in $W_{s}$ . (We allow dummy variables). In particular, we can think that $W_{s}$ begins with a prefix $U\#u_{1}Z_{1}v_{1}\#$ and ends with a suffix $\#\overline{v_{1}}\overline{Z_{1}}\overline{v_{1}}\#\overline{U}$ . Having this, the word $W_{s}$ is covered by factors $\#u(f,X)w(g,Y)v\#$ and $\#\overline{v}\overline{Z}\overline{u}\#$ . In the first steps of pair compression we follow substitution transitions and a factor $\#u(f,X)w(g,Y)v\#$ becomes $\#uf(u_{X})(f,X)f(v_{X})wg(u_{Y})(g,Y)g(v_{Y})v\#$ and $\#\overline{v}\overline{Z}\overline{u}\#$ becomes $\#\overline{v}\,\overline{v_{Z}}\overline{Z}\overline{u_{Z}}\,\overline{u}\#$ .

Pair compression compresses all factors $uf(u_{X})$ , $g(v_{Y})v$ , $\overline{v}\,\overline{v_{Z}}$ and $\overline{u_{Z}}\,\overline{u}$ into single letters. This bounds the total increase by the first substitution transitions by $2n$ .

We don’t have such a simple bound for the factors $f(v_{X})wg(u_{Y})$ because the corresponding positions in $\sigma(W_{s})$ interact with the positions in $I(Z)$ . Let $[\ell,r]$ be the interval in $[1,\,|\sigma(W_{s})|-1]$ corresponding to $f(v_{X})wg(u_{Y})$ . Let us cut the interval in $[\ell,r]$ into a disjoint union of intervals, each of them having exact length $\varepsilon=30n$ . If a position belongs to any of these intervals of length $\varepsilon=30n$ , then we mark the position. Thus, at least $\left|\mathinner{w}\right|-\varepsilon$ positions in the interval $[\ell,r]$ belonging to $f(v_{X})wg(u_{Y})$ are marked by these intervals. (We have no better bound since $X$ and $Y$ might be dummy variables.) Removing if necessary at most $n$ of these intervals we may assume that their total number is $\ell n$ with $\ell\geqslant 0$ . The crucial observation is that we have

[TABLE]

Each interval of length $\varepsilon$ is split in $3n$ intervals of length $10$ . By Lem. 11.3 an interval of length $\varepsilon$ can have at most $2n$ red positions. Thus, in each interval of length $\varepsilon$ there are at least $n$ intervals of length $10$ without any red position. By Lem. 11.6 each such interval $[i,i+9]$ contains an interval of length $4$ where all positions are inequivalent. By Lem. 11.4 we can compress at least one interval in that interval of length $10$ . (Note that Lem. 11.4 provides us with a compression inside $[i+1,i+8]$ . This means, the compression is guaranteed even if we compressed before an interval $[i-1,i]$ or $[i+9,i+10]$ .) This means that the length of $\varepsilon=30n$ is reduced to at most $\varepsilon=29n$ during compression. Hence,

[TABLE]

Due to (47) we conclude $\left|\mathinner{W_{r^{\prime}}}\right|\in\left|\mathinner{W_{s}}\right|+\mathcal{O}(\delta n)$ . This shows (46) and hence, the assertion of the proposition. ∎

Remark 11.11.

Prop. 11.10 tells us that there is a constant $\kappa_{1}\in\mathbb{N}$ such that $\left\|\mathinner{W_{r^{\prime}}}\right\|\leqslant\frac{29\left\|\mathinner{W_{r}}\right\|}{30}+\kappa_{1}\delta n.$ We content ourselves with a generous bound by letting $\kappa_{1}=97$ . This bound suffices and it is an overestimation, as it can seen by the preceding proof and by reversing the $\mathcal{O}$ -notation into concrete constants. The value $\kappa_{1}=97$ was chosen such that the later constant $\kappa$ in Cor. 12.1 is divisible by $100$ . Thus, we can conclude $W_{r^{\prime}}$ and for every equation $W$ we see on the path from $E_{r}$ to $E_{r^{\prime}}$ the following upper bounds.

(1)

If $\left\|\mathinner{W_{r}}\right\|>30\kappa_{1}\delta n$ , then $\left\|\mathinner{W_{r^{\prime}}}\right\|<\left\|\mathinner{W_{r}}\right\|$ and $\left\|\mathinner{W}\right\|<\left\|\mathinner{W_{r}}\right\|+40\delta n$ . 2. (2)

If $\left\|\mathinner{W_{r}}\right\|\leqslant 30\kappa_{1}\delta n$ , then $\left\|\mathinner{W}\right\|\leqslant\left\|\mathinner{W_{r}}\right\|+40\delta n\leqslant 2960\delta n$ .

These estimations are used in next section.

12. Putting it all together: the overall compression method

Now we explain what we do if we start the first compression round at the initial state $E_{\mathrm{init}}$ with a given initial entire solution $(\mathrm{id}_{A^{*}},\sigma_{\mathrm{init}})$ . We begin a first compression round $r$ with $r=0$ and $E_{0}=E_{\mathrm{init}}$ with a given initial entire solution $(\alpha_{0},\sigma_{0})=(\mathrm{id}_{A^{*}},\sigma_{\mathrm{init}})$ . We end the round after one phase each of $\delta$ -periodic compression and pair compression with a standard state $E_{1}$ and an entire solution $(\alpha_{1},\sigma_{1})$ such that $\left\|\mathinner{E_{1},\alpha_{1},\sigma_{1}}\right\|<\left\|\mathinner{E_{0},\alpha_{0},\sigma_{0}}\right\|$ . We repeat this process by starting the next round $r+1$ with $E_{r}$ and $(\alpha_{r},\sigma_{r})$ and ending the round in $E_{r^{\prime}}$ and $(\alpha_{r^{\prime}},\sigma_{r^{\prime}})$ . For simplicity of notation we write $r+1=r^{\prime}$ . Thus,

[TABLE]

We conclude

[TABLE]

By (49) the process terminates: there exists some round $t\geqslant 0$ and during that round we reach a final state $E_{\mathrm{fin}}$ without variables and with an entire solution $(\alpha_{\mathrm{fin}},\mathrm{id}_{C})$ . Hence, the entire process defines a path in $\mathcal{F}$ which is labeled by some $h_{1}\cdots h_{t}\in\operatorname{End}(C^{*})$ such that $h_{1}\cdots h_{t}(W_{\mathrm{fin}})=W_{\mathrm{init}}.$ We have $\left|\mathinner{W_{\mathrm{init}}}\right|=n$ and therefore $\left\|\mathinner{W_{\mathrm{init}}}\right\|\leqslant 30\kappa_{1}\delta n$ . (Note that for large $n\to\infty$ the ratio $\frac{\left\|\mathinner{W_{\mathrm{init}}}\right\|}{30\kappa_{1}\delta n}$ tends to [math]. For large $n$ the initial size $\left\|\mathinner{W_{\mathrm{init}}}\right\|$ is much, much smaller than $30\kappa_{1}\delta n$ . By Rem. 11.11, for all rounds $r$ with $0\leqslant r\leqslant t$ we can state:

[TABLE]

We also need an estimation for the maximal weight of an equation in the middle of each round. Prop. 11.10 says we have to add at most $40\delta n$ with respect to the starting point of a round. Thus, the conclusion of (50) is therefore: whenever we see an equation $E=(W,B,\mathcal{X},\theta,\mu)$ on the path from $E_{\mathrm{init}}$ to $E_{\mathrm{fin}}$ we have

[TABLE]

Corollary 12.1.

Let $\kappa=3000$ and let $\mathcal{B}$ the subautomaton of $\mathcal{F}$ which is defined defined as follows. The states of $\mathcal{B}$ are the extended equations $(W,B,\mathcal{X},\theta,\mu)$ where

[TABLE]

Then $\mathcal{B}$ is a finite and complete subautomaton of $\mathcal{F}$ . Let $\mathcal{A}_{\mathcal{S}}$ be the trimmed subautomaton of $\mathcal{B}$ , then the NFA $\mathcal{A}_{\mathcal{S}}$ accepts a rational set of $A$ -morphisms $L(\mathcal{A}_{\mathcal{S}})\subseteq\operatorname{End}(C^{*})$ satisfying the following conditions from Thm. 4.3

[TABLE]

Moreover, $\mathrm{Sol}(\mathcal{S})=\emptyset$ if and only if $L(\mathcal{A}_{\mathcal{S}})=\emptyset$ ; and $\left|\mathinner{\mathrm{Sol}(\mathcal{S})}\right|<\infty$ if and only if $\mathcal{A}_{\mathcal{S}}$ doesn’t contain any directed cycle.

Proof.

The automaton $\mathcal{B}$ is finite because first, the number of states is finite and second, if $E$ is any state in $\mathcal{B}$ , then there are only finitely many $E\overset{h}{\longrightarrow}E^{\prime}$ transitions in $\mathcal{F}$ where $E^{\prime}\in\mathcal{B}$ . Thus, the out-degree is finite for every state in $\mathcal{B}$ . Since $\mathcal{B}$ is finite, $\mathcal{A}_{\mathcal{S}}$ is finite, too. The NFAs $\mathcal{A}_{\mathcal{S}}$ and $\mathcal{B}$ are both sound by Prop. 7.5. They are complete, this follows from Prop. 7.5, since $\kappa=3000$ is large enough by (51). This shows

[TABLE]

Finally, Prop. 8.3 implies that $\mathrm{Sol}(\mathcal{S})=\emptyset$ if and only if $L(\mathcal{A}_{\mathcal{S}})=\emptyset$ and that $\left|\mathinner{\mathrm{Sol}(\mathcal{S})}\right|<\infty$ if and only if $\mathcal{A}_{\mathcal{S}}$ doesn’t contain any directed cycle. ∎

12.1. The $\mathsf{NSPACE}$ algorithm to compute the trim NFA $\mathcal{A}_{\mathcal{S}}$ .

The method is standard and is essentially the same as in [26, 11, 5]. Therefore we give a rough sketch only. The key is the upper bound in Cor. 12.1: it is enough to consider states $(W,B,\mathcal{X},\theta,\mu)$ where $\left\|\mathinner{W}\right\|\leqslant 3000\delta n\in\mathcal{O}(\left|\mathinner{H}\right|\cdot n^{2})$ . This implies that the maximal length of an equation and the maximal number of $H$ -visible letters is in $\mathcal{O}(\left|\mathinner{H}\right|\cdot n^{2})\subseteq\mathcal{O}(\left|\mathinner{H}\right|{\left\|\mathinner{\mathcal{S}}\right\|}^{2})$ . This in turn gives the upper bound $\mathcal{O}({\left|\mathinner{H}\right|}^{2}{\left\|\mathinner{\mathcal{S}}\right\|}^{2})$ on the alphabet $C$ . It is also clear that we need at most $\mathcal{O}({\left\|\mathinner{\mathcal{S}}\right\|})$ variables. To each symbol we have to attach its $\mu$ -value in the finite monoid $N$ .

By Sect. 4.2 storing a $\mu$ -value costs $m(\mathcal{S})$ bits by (4). As a consequence we can specify a state $E$ (and therefore a transition $E\overset{h}{\longrightarrow}E^{\prime}$ ) in $\mathcal{A}_{\mathcal{S}}$ with $\mathcal{O}(|H|\cdot\left\|\mathinner{\mathcal{S}}\right\|^{2}\cdot\log|A|\cdot m(\mathcal{S})\cdot\log\left\|\mathinner{\mathcal{S}}\right\|)$ bits.

Our algorithm must output all transitions $E\overset{h}{\longrightarrow}E^{\prime}$ which belong to $\mathcal{A}_{\mathcal{S}}$ . Hence, we consider all candidates $E\overset{h}{\longrightarrow}E^{\prime}$ based on the upper bound of bits for their specification one after another in some order, say in some lexicographical order. The algorithm has to decide if it outputs the transition or whether it moves to the next candidate. Thus, when considering whether or not $E\overset{h}{\longrightarrow}E^{\prime}$ belongs to $\mathcal{A}_{\mathcal{S}}$ , then the algorithm guesses a path of transitions from an initial state to the state $E$ and a path of transitions from $E^{\prime}$ to a final state. If the guess is successful, then it outputs $E\overset{h}{\longrightarrow}E^{\prime}$ and it moves to the next candidate. If unsuccessful, then we apply again the theorem of Immerman-Szelepcsényi: $\mathsf{NSPACE}(|H|\left\|\mathinner{\mathcal{S}}\right\|^{2}\log|A|\,m(\mathcal{S})\log\left\|\mathinner{\mathcal{S}}\right\|)$ is closed under complementation. Hence, the algorithm “knows” whether or not $E\overset{h}{\longrightarrow}E^{\prime}$ belongs to $\mathcal{A}_{\mathcal{S}}$ before moving to the next candidate.

The proof of Thm. 4.3 is complete, and the first part of the paper is finished.

13. Part 2: The existential theory with rational constraints for virtually free groups

It was shown in [7, 35] that the existential theory with rational constraints in f.g. virtually free groups is decidable. Our main result (Thm. 14.2) provides an effective EDT0L description for the full set of satisfying assignments to a Boolean formula in free variables over equations and rational constraints. In order to make our statement precise we need some preparation.

13.1. NFAs revisited

Let $M$ and $M^{\prime}$ be finitely generated monoids. In the application $M$ and $M^{\prime}$ are fixed and not part of the input. Therefore we can define the size of an NFA over $M$ (resp. $M^{\prime}$ ) which is, up to a constant, independent of the generating set.

We begin by choosing any finite generating set $\Sigma\subseteq M$ . Then we specify an NFA for $M$ as tuple $\mathcal{A}=(Q,\Sigma,\delta,\mathscr{I},\mathscr{F})$ where the set of transitions $\delta$ is finite and satisfies $\delta\subseteq Q\times\Sigma^{*}\times Q$ . Having this, a natural definition for the input size of $\mathcal{A}$ is

[TABLE]

The transitions in $\varphi(\mathcal{A})$ might be labeled by words of length greater than $1$ . However, this can be “repaired” easily by replacing a transition $(p,b_{0}\cdots b_{k},q)$ with $b_{i}\in\Gamma$ and $k>0$ by a sequence of transitions

[TABLE]

were $p_{1},\ldots,p_{k-1}$ are fresh states. The input size of the new automaton is at most twice as large as before. The exact size of an NFA $\mathcal{A}$ is of not important. We let

[TABLE]

This is well-defined, since if we move to another finite generating set $\Sigma^{\prime}$ for $M$ , then we see $\left\|\mathinner{\mathcal{A}}\right\|_{\text{in},\Sigma^{\prime}}\in\mathcal{O}(\left\|\mathinner{\mathcal{A}}\right\|_{\text{in},\Sigma})$ . Thus, it is convenient to denote $\mathcal{A}$ simply as $\mathcal{A}=(Q,M,\delta,\mathscr{I},\mathscr{F})$ because then the interpretation $L(\mathcal{A})\subseteq M$ is encoded in the syntax. Still, we can use $\left\|\mathinner{\mathcal{A}}\right\|_{\text{in}}$ up to multiplicative constants.

Let $\varphi\colon M\to M^{\prime}$ be a homomorphism to a monoid with a finite generating set $\Sigma^{\prime}$ , then the NFA $\varphi(\mathcal{A})$ is defined as $(Q,M,\varphi(\delta),\mathscr{I},\mathscr{F})$ where $\varphi(p,a,q)=(p,\varphi(a),q)$ . For $s,t\in Q$ let $L(\mathcal{A},s,t)=L(\mathcal{A},M,\delta,\left\{\mathinner{s}\right\},\left\{\mathinner{t}\right\})$ . If $\left|\mathinner{\varphi(a)}\right|\in\mathcal{O}(1)$ for all $a\in\Sigma$ , then there is a result which is again independent of the choice of $\Sigma$ and $\Sigma^{\prime}$ :. We have

[TABLE]

13.2. Exponential expressions

The ideas and results in this section are not new. The notion of exponential expression was proposed, for example, by Plandowski in [44]. For the application to $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ exponential expressions are crucial to show a complexity within $\mathsf{PSPACE}$ . Intuitively it is more natural to represent strings by allowing exponents. For example, if $u$ is a word, then it is more natural to write $u^{100}$ rather than in plain form by repeating $u$ a hundred times $uuuuuuuuuuuuuuuuuuuuu\cdots$ .

Exponential expressions (and plain exponential expressions) over an alphabet $\Sigma$ and their sizes are defined inductively as follows.

(1)

Every word $w\in\Sigma^{*}$ is a plain exponential expression of size $\left\|\mathinner{w}\right\|=\left|\mathinner{w}\right|$ . 2. (2)

Every plain exponential expression is an exponential expression. 3. (3)

If $E,E^{\prime}$ are exponential expressions, then the concatenation $EE^{\prime}$ is an exponential expression of size $\left\|\mathinner{EE^{\prime}}\right\|=\left\|\mathinner{E}\right\|+\left\|\mathinner{E^{\prime}}\right\|$ . If $E,E^{\prime}$ are plain, then $EE^{\prime}$ is plain, too. 4. (4)

If $E$ is an exponential expression and $k\in\mathbb{N}$ , then $E^{k}$ is an exponential expression of size $\left\|\mathinner{E^{k}}\right\|=1+\left\|\mathinner{E}\right\|+\log k$ .

Since $\Sigma$ is equipped with an involution, we define for all $k\in\mathbb{Z}$ the expression $E^{-k}$ as a synonym for ${\overline{E}}^{k}$ ; and we let $E^{0}$ denote the empty word $1$ . The size of the expression $E^{0}$ is still $\left\|\mathinner{E}\right\|+1$ .

In the following we allow that an equation appearing in a Boolean formula $\Phi$ is written as $E=E^{\prime}$ where $E$ and $E^{\prime}$ are exponential expressions. We view $E$ and $E^{\prime}$ as words $\Sigma^{*}$ which have a special encoding in a compact form.

13.3. The existential theory with constraints and expressions

As above $M$ denotes a finitely generated monoid with involution. We let $\Sigma\subseteq M$ be any finite symmetric set of generators: that is, $a\in\Sigma\implies\overline{a}\in\Sigma$ . Let $\pi\colon\Sigma^{*}\to M$ be the canonical morphism which is induced by the inclusion $\Sigma\subseteq M$ . By $\Omega$ we denote a countable set of variables such that $M\cap\Omega=\emptyset$ . Without restriction we assume that $\Omega$ is a set with involution and $X\neq\overline{X}$ for all $X\in\Omega$ . As usual, we let $\overline{g}={g}^{-1}$ for group elements.

The existential theory of $M$ with rational constraints and exponential expressions is defined with the help of Boolean formulae in free variables from $\Omega$ . As we did in Sect. 4.2, we obtain more accurate (and therefore better) complexity results if we define the size of a Boolean formula $\Phi$ as a pair $(\left\|\mathinner{\Phi}\right\|_{\text{eq}},\left\|\mathinner{\Phi}\right\|_{\text{rat}})$ . The parameter $\left\|\mathinner{\Phi}\right\|_{\text{eq}}$ behaves as if all NFAs defining the rational constraints were of constant size. Thus, essentially, it adds up the sizes of the equations of the exponential expressions defining the equations. This is reflected by the index “eq”. The parameter $\left\|\mathinner{\Phi}\right\|_{\text{rat}}$ adds up the input sizes for the NFAs which define the rational constraints. This is reflected by the index “rat”.

The formal definitions are as follows. Here we assume that every constraint $X\in L$ with $L\in\mathrm{Rat}(M)$ is given as $X\in\pi L(\mathcal{A})$ (resp. $X\in L(\mathcal{A})$ ) where $\mathcal{A}$ is an NFA as in Sect. 13.1. Exponential expressions were defined in Sect. 13.2.

(1)

Every atomic formula is Boolean formula. The atomic formulae are:

•

The constant $\bot$ (meaning “false”)

$\left\|\mathinner{\bot}\right\|_{\text{eq}}=\left\|\mathinner{\bot}\right\|_{\text{rat}}=1$ .

•

Exponential expressions $E=E^{\prime}$ over $(\Sigma\cup\Omega)^{*}$ .

$\left\|\mathinner{E=E^{\prime}}\right\|_{\text{eq}} =1+\left\|\mathinner{E}\right\|_{\text{eq}}+\left\|\mathinner{E^{\prime}}\right\|_{\text{eq}}$ and $\left\|\mathinner{E=E^{\prime}}\right\|_{\text{rat}} =0$ .

•

Constraints $X\in L(\mathcal{A})$ .

$\left\|\mathinner{X\in L(\mathcal{A})}\right\|_{\text{eq}}=1$ and $\left\|\mathinner{X\in L(\mathcal{A})}\right\|_{\text{rat}}=\left\|\mathinner{\mathcal{A}}\right\|_{\text{in}}$ . 2. (2)

If $\Phi,\Psi$ are Boolean formulae, then so are $(\Phi\vee\Psi)$ , $(\Phi\wedge\Psi)$ , and $(\neg\Phi)$ , but we omit brackets when possible.

$\left\|\mathinner{\Phi\vee\Psi}\right\|_{\star}=\left\|\mathinner{\Phi\wedge\Psi}\right\|_{\star}=\left\|\mathinner{\Phi}\right\|_{\star}+\left\|\mathinner{\Psi}\right\|_{\star}$ , and $\left\|\mathinner{\neg\Phi}\right\|_{\star}=\left\|\mathinner{\Phi}\right\|_{\star}$ for $\star\in\left\{\mathinner{\text{eq},\text{rat}}\right\}$ .

Let $\Phi$ be a Boolean formula and $\sigma\colon\Omega\to{M}$ be a morphism (that is, a mapping respecting the involution). Then the truth value $\sigma(\Phi)$ is defined in the obvious way. If there exists some $\sigma$ with $\sigma(\Phi)=\text{ true}$ , then we say that $\Phi$ is satisfiable. We also say that $\sigma$ is a solution if $\sigma(\Phi)=\text{ true}$ because it solves the satisfiability problem. So, we do not distinguish between satisfying assignments and solutions. The existential (first-order) theory with rational constraints refers to the set of satisfiable Boolean formulae

[TABLE]

We are not only interested to decide $\exists\mathrm{FOTh}({M},\mathrm{Rat})$ , what we aim for is an algorithm which produces on input a Boolean formula $\Phi$ an effective description of the full solution set $\mathrm{Sol}(\Phi,M)$ . To define it properly we let $\mathcal{X}_{\Phi}$ be the set of variables $X$ such that $X$ or $\overline{X}$ appears in $\Phi$ . We let

[TABLE]

Note that $\sigma\colon\Omega\to M$ satisfies $\Phi$ if and only if its restriction to $\mathcal{X}_{\Phi}$ satisfies $\Phi$ . It is also clear that every morphism $\sigma\colon\mathcal{X}_{\Phi}\to M$ satisfying $\Phi$ can be extended to a morphism $\sigma\colon\Omega\to M$ satisfying $\Phi$ . If the context of $M$ is clear, we abbreviate $\mathrm{Sol}(\Phi)=\mathrm{Sol}(\Phi,M)$ . Once we have chosen a presentation $\pi\colon S\to M$ where $S$ is finite and $\pi$ is onto, then we typically represent elements of $M$ by words over $S$ and a morphism $\sigma\colon\Omega\to M$ is defined via a mapping $\sigma\colon\Omega\to S^{*}$ . Moreover, without restriction $\Omega$ comes with a linear order. If $\left\{\mathinner{X_{1},\ldots,X_{k}}\right\}$ is the subset of the first $k$ variables with $X_{i}\leqslant X_{j}$ for all $i\leqslant j$ , then we let

[TABLE]

Clearly, to decide $\exists\mathrm{FOTh}({M},\mathrm{Rat})$ is the same as to decide on input $\Phi$ whether or not $\mathrm{Sol}(\Phi)$ is empty. Moreover, $\mathrm{Sol}(\Phi)=\emptyset\iff\mathrm{Sol}_{S,0}(\Phi)=\emptyset$ . Note that either $\mathrm{Sol}_{S,0}=\emptyset$ or $\mathrm{Sol}_{S,0}=\left\{\mathinner{\emptyset}\right\}$ . We will see that $\mathrm{Sol}_{S,k}(\Phi)$ is an effective EDT0L relation for every $k$ if $M$ is a f.g. virtually free group.

When proving this result for virtually free groups we make various transformations on NFAs (which up to a constant factor don’t change ${\left\|\mathinner{\mathcal{A}}\right\|}_{\text{in}}$ ) before, eventually, we switch to Boolean matrices.

13.4. Removing exponential expressions in $\Phi$

Exponential expressions in Boolean formulae as in (56) are used because they may reduce the size of $\Phi$ significantly. On the other hand, with the help of more variables we can transform $\Phi$ into a new formula $\Psi$ where all equations are written in plain form as $U=V$ . The transformation is not expensive; and it doesn’t change the full solution set:

Proposition 13.1.

There is a deterministic algorithm working in linear space which takes as input a Boolean formula $\Phi$ using exponential expressions.

The output is a formula $\Psi$ having the following properties.

(1)

Equations in $\Psi$ appear in plain form as $U=V$ . Hence, $\left\|\mathinner{UV}\right\|=\left|\mathinner{UV}\right|$ . 2. (2)

$\left\|\mathinner{\Psi}\right\|_{\text{eq}}\in\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{eq}})$ . 3. (3)

$\left\|\mathinner{\Psi}\right\|_{\text{rat}}=\left\|\mathinner{\Phi}\right\|_{\text{rat}}$ . 4. (4)

$\mathcal{X}_{\Phi}\subseteq\mathcal{X}_{\Psi}$ . 5. (5)

The restriction $(\sigma\colon\mathcal{X}_{\Psi}\to M)\mapsto(\sigma\colon\mathcal{X}_{\Phi}\to M)$ induces a bijection

[TABLE]

Proof.

The method is standard: replace all exponential expressions by straight-line programs (SLPs), see for example [33, 34]. More precisely, as soon as an exponential expression $E=T^{0}$ with $e=0$ appears, replace the expression $E$ by the empty word $1$ . If an exponential expression $T^{1}$ appears in $\Phi$ , then replace every occurrence of $T^{1}$ simply by $T$ . If an exponential expression $E=T^{e}$ with $e\geqslant 2$ appears in $\Phi$ , then define a fresh variable $[T,e]$ . (This implicitly means to introduce $\overline{[T,e]}=[\overline{T},e]$ , too. We don’t repeat this anymore.) Whenever a variable $[T,\ell]$ is introduced where $\ell\geqslant 2$ , then we introduce another fresh variable $[T,\left\lfloor\mathinner{\ell/2}\right\rfloor]$ , too. In particular, $[T,e]$ and $[T,1]$ are introduced (but the condition $\ell\geqslant 2$ makes sure that $[T,0]$ is never introduced). The total number of fresh variables $[T,\ell]$ introduced that way is bounded by $2(1+\log e)\in\mathcal{O}(\log e)$ .

After that step, replace all occurrences of $E$ by $[T,e]$ , if $E$ was defined by $E=T^{e}$ in $\Phi$ ; and for each fresh $[T,\ell]$ with $\ell\geqslant 2$ introduce a new plain equation

[TABLE]

Moreover, introduce a single equation $[T,1]=T.$ The effect is that each occurrence of $E=T^{e}$ , having size $\left\|\mathinner{E}\right\|+\left\|\mathinner{T}\right\|+2+\log(1+e)$ , is removed. The gain of $\left\|\mathinner{T}\right\|+\log e$ is mitigated by $\mathcal{O}(\log e)$ new equations of constant size and one more equation $[T,1]=T$ of size $\left\|\mathinner{T}\right\|+2$ .

After that step replace $\Phi$ by the conjunction of $\Phi$ with the conjunction of the new equations. Continue until all equations are written in plain form. This defines the formula $\Psi$ . Note that is not necessary to add any constraint on the fresh variables $[T,\ell]$ . Therefore, $\left\|\mathinner{\Psi}\right\|_{\text{rat}}=\left\|\mathinner{\Phi}\right\|_{\text{rat}}$ . The proposition follows. ∎

14. Virtually free groups

We restrict ourselves to the non-uniform complexity where the given virtually free group is not part of the input. The restriction allows us to ignore the way a virtually free group is given to us. For example whether the group is given by a context-free grammar for the word problem or whether it is given as a fundamental group of a finite graph of finite groups may result in uniform complexities which differ exponentially. We refer the interested reader to the arXiv version of [57] for more details. See also Rem. 14.8.

In the following $G$ denotes a fixed finitely generated virtually free group. Thus, there is a finitely generated free subgroup $F$ such that $H=G/F$ is finite. Replacing $F$ by the normal subgroup $\bigcap\left\{gF{g}^{-1}\mathrel{\left|\vphantom{gF{g}^{-1}}\vphantom{g\in G}\right.}g\in G\right\}$ (which is of finite index in $G$ ) we can assume without restriction that $F$ is normal and that $H$ is a finite group. That is, we start with some surjective homomorphism $\gamma\colon G\to H$ where $H$ is finite and the kernel $\ker(\gamma)$ is a f.g. free group. This yields a short exact sequence:

[TABLE]

Choosing a generating set for $F$ and a set of coset representatives from $H$ , we obtain a generating set for $G$ . We need generating sets which are closed under involution, so we are more specific. We use the following definition.

Definition 14.1.

Let $G$ be given as in (58). We say that a subset $S$ of $G$ is a standard generating set for $G$ if the following conditions are satisfied.

•

$S$ * can be written as a union $A_{+}\cup A_{-}\cup H_{+}\cup H_{-}\subseteq G$ .*

•

$A_{+}$ * is a basis for $F$ , that is $F=F(A_{+})$ .*

•

$a\in A_{+}\iff{a}^{-1}\in A_{-}$ * for all $a\in A=A_{+}\cup A_{-}$ .*

•

$\gamma$ * induces a bijection between $H^{\prime}=H_{+}\cup\left\{\mathinner{1}\right\}$ and $H$ .*

•

$H_{-}=\left\{h\in G\mathrel{\left|\vphantom{h\in G}\vphantom{{h}^{-1}\in H_{+}}\right.}{h}^{-1}\in H_{+}\right\}$ .

Every standard generating set is closed under the involution with $\overline{b}={b}^{-1}\in G$ . The three set $A_{+}$ , $A_{-}$ , and $H^{\prime}$ are pairwise disjoint subsets of $G$ . There is a bijection between $H_{+}$ and $H_{-}$ , but perhaps $H_{+}\cap H_{-}\neq\emptyset$ .

Let $\pi_{S}\colon S^{*}\to G$ denote the canonical projection. We say that $\widehat{w}\in S^{*}$ is in standard normal form if we can write $\widehat{w}=uh$ where $u\in A^{*}$ is a freely reduced word (that is without factors $a\overline{a}$ ) and $h\in H^{\prime}$ . By $\mathop{\mathrm{missing}}{snf}_{S}(G)$ we denote the set of standard normal forms. For every $w\in S^{*}$ there is a unique $\mathop{\mathrm{missing}}{snf}_{S}(w)\in\mathop{\mathrm{missing}}{snf}_{S}(G)$ such that $w=\mathop{\mathrm{missing}}{snf}_{S}(w)$ in $G$ . The set of freely reduced words over $A$ becomes $A^{*}\cap\mathop{\mathrm{missing}}{snf}_{S}(G)$ ; and we let ${\mathop{\mathrm{missing}}{snf}}_{A}(G)=A^{*}\cap\mathop{\mathrm{missing}}{snf}_{S}(G)$ . Hence,

[TABLE]

Theorem 14.2.

Let $G$ be a finitely generated virtually free group. Then with respect to any short exact sequence as in (58) there is a standard generating set $S$ and an $\mathsf{NSPACE}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2}(\left\|\mathinner{\Phi}\right\|_{\text{rat}}^{2}+\log\left\|\mathinner{\Phi}\right\|_{\text{eq}}))$ algorithm which performs the following task. It takes as input a Boolean formula $\Phi$ (according to Sect. 13.3) with $\mathcal{X}_{\Phi}=\left\{\mathinner{X_{1},\overline{X}_{1},\ldots,X_{k},\overline{X}_{k}}\right\}$ such that $X_{i}$ is the $i$ th variable in some fixed chosen linear order on $\Omega$ . The output is an extended alphabet $C$ of size $\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2})$ , letters $d_{i}\in C$ for all $1\leqslant i\leqslant k$ , and a trim NFA ${\mathcal{A}}_{\Phi}$ accepting a rational set of $S$ -morphisms over $C^{*}$ such that the EDT0L relation

[TABLE]

is equal to the full solution set in standard normal forms

[TABLE]

Moreover, ${\mathrm{Sol}}(\Phi)=\emptyset$ if and only if $L({\mathcal{A}}_{\Phi})=\emptyset$ ; and $\left|\mathinner{{\mathrm{Sol}}(\Phi)}\right|<\infty$ if and only if ${\mathcal{A}}_{\Phi}$ doesn’t contain any directed cycle.

Remark 14.3.

In a simplified analysis using a single parameter, the natural choice is to define $\left\|\mathinner{\Phi}\right\|=\left\|\mathinner{\Phi}\right\|_{\text{eq}}+\left\|\mathinner{\Phi}\right\|_{\text{rat}}$ . This yields

[TABLE]

If $\left\|\mathinner{\Phi}\right\|_{\text{rat}}\in\mathcal{O}(\sqrt{\log\left\|\mathinner{\Phi}\right\|_{\text{eq}}})$ , then we have

[TABLE]

Remark 14.4.

Let us comment why we consider only a subset of the first $k$ variables of $\mathcal{X}_{+}$ rather than all variables. The reason is that during the proof we manipulate $\Phi$ in various ways including some which introduce fresh variables. But these new variables are just auxiliary symbols, and we make sure that they don’t enlarge the full solution set. If we introduce fresh variables, then we put them in the linear order behind the first $k$ variables. Therefore, there is no risk to denote ${\mathrm{Sol}}(\Phi)$ as ${\mathrm{Sol}}_{S,k}(\Phi)$ .

14.1. Proof of Thm. 14.2, Phase 1

Using Prop. 13.1 we may assume that all equations in $\Phi$ are written in plain form as $U=V$ where $U,V\in(S\cup\mathcal{X})^{*}$ . In the following we introduce many fresh variables into $\mathcal{X}$ . The enlarged set is still called $\mathcal{X}$ . Moreover, we choose a subset of positive variables $\mathcal{X}_{+}$ such that $\left\{\mathinner{X_{1},\ldots,X_{k}}\right\}\subseteq\mathcal{X}_{+}$ , $\mathcal{X}=\mathcal{X}_{+}\cup\left\{\overline{X}\mathrel{\left|\vphantom{\overline{X}}\vphantom{X\in\mathcal{X}_{+}}\right.}X\in\mathcal{X}_{+}\right\}$ , and $X\in\mathcal{X}_{+}\iff\overline{X}\notin\mathcal{X}_{+}$ .

Having this, we push all negations to the atomic formulae using De Morgan’s law. This increase the size at most by the number of atomic formulae. For each inequality $\neg(U=V)$ we introduce a fresh variable $X$ and then we replace $\neg(U=V)$ by the conjunction $U=VX\wedge\neg(X\in\left\{\mathinner{1}\right\})$ . This increases the size by the number of inequalities since the singleton $\left\{\mathinner{1}\right\}$ is accepted by a two-state NFA. Thus, without restriction $\Phi$ doesn’t contain any negation and only three types of atomic formulae:

[TABLE]

Here, denotes an NFA of the form $\mathcal{A}=(Q,S,\delta,\mathscr{I},\mathscr{F})$ with $\delta\subseteq Q\times S\times Q$ . We may assume because for every NFA $\mathcal{A}$ there is another NFA $\overline{\mathcal{A}}$ of the same size such that $L(\overline{\mathcal{A}})=\overline{L(\mathcal{A})}$ (the complement of $L(\mathcal{A})$ ). We also write $X\in L(\mathcal{A})$ or $X\notin L(\mathcal{A})$ because $S\subseteq G$ and therefore we can view $L(\mathcal{A})$ directly as a rational subset of $G$ .

Lemma 14.5.

Let $\gamma\colon G\to H$ be as above. In particular, $F={\gamma}^{-1}(H)$ is free and $H$ is finite. It is enough to prove Thm. 14.2 under the following assumptions about the input formula $\Phi$ .

•

$\Phi$ * implies $\bigwedge\left\{X\in F\mathrel{\left|\vphantom{X\in F}\vphantom{X\in\mathcal{X}_{+}}\right.}X\in\mathcal{X}_{+}\right\}.$ (Note that the syntax $X\in F$ makes sense since the f.g. free group $F$ is a rational subset in $G$ .)*

•

If an NFA $\mathcal{A}$ appears in $\Phi$ , then $L(\mathcal{A})\subseteq F$ where $\mathcal{A}$ is an NFA over $G$ and the transitions are labeled with an arbitrary, but fixed, finite set of generators of $G$ .

•

$\Phi$ * is a conjunction where each atomic formula is either an equation in plain form $U=V$ with $UV\in(S\cup\mathcal{X})^{*}$ or $X\in L(\mathcal{A})$ or $X\notin L(\mathcal{A})$ .*

Remark 14.6.

Assume that $\Phi$ satisfies the assumptions of Lem. 14.5. Then Thm. 14.2 implies that there is standard set of generators $S$ containing a basis $A_{+}$ of $F$ such that

[TABLE]

In particular, the full solution set ${\mathrm{Sol}}_{S,k}(\Phi)$ is an EDT0L relation over freely reduced words of $A$ .

The proof of Lem. 14.5 is based on two closure properties:

finite unions of EDT0L (resp. rational) languages in a monoid $M$ are EDT0L (resp. rational); and
if $L\subseteq M$ is an EDT0L (resp. rational) language and $m\in M$ , then $Lm$ is EDT0L (resp. rational). The analogous statements hold for EDT0L relations.

Proof of Lem. 14.5.

The difficult part is to show the first and third item because we have to respect the given space bounds. The second item is very easy to show, and we prove it “on the fly” when showing the first item.

Let $\Phi$ be any input formula for Thm. 14.2. We wish to add the constraint $X\in F$ for all variables. This requires the introduction of fresh variables. More precisely, for each $X\in\mathcal{X}_{+}$ and $g\in H^{\prime}$ we introduce a new variable $X_{g}$ with $\overline{X_{g}}=\overline{X}_{g}$ ; and we construct an NFA $\mathcal{A}_{g}$ such that $L(\mathcal{A}_{g})=L(\mathcal{A})\overline{g}$ for each $\mathcal{A}$ . The NFA $\mathcal{A}_{g}$ is obtained by adding a single new final state and new transitions from the former final states to the new single one, all of them are labeled by the letter $\overline{g}$ . The size of $\mathcal{A}_{g}$ increases by a constant. Moreover, each function $\eta\colon\mathcal{X}_{+}\to H$ defines a new formula $\Phi^{\prime}_{\eta}$ over the variables $X_{\eta(X)}$ as follows: every occurrence of $X$ (resp. $\overline{X}$ ) inside an equation is replaced by $X_{g}\overline{g}$ (resp. $g\overline{X}_{g}$ ) where $g=\eta(X)$ . The length of each equation is at most doubled. Every constraint $X\in L(\mathcal{A})$ (resp. $X\notin L(\mathcal{A})$ ) is replaced by the $X_{g}\in L(\mathcal{A}_{g})$ (resp. $X_{g}\notin L(\mathcal{A}_{g})$ ). Recall that $L(\mathcal{A}_{g})=L(\mathcal{A})\overline{g}$ . Let $\Phi^{\prime}_{\eta}$ denote the result of that transformation. Then we let

[TABLE]

Note that a constraint $X_{g}\in F$ is the same as $\gamma(X_{g})=1$ . Therefore we can use $H$ as a recognizing finite monoid for all $X_{g}$ . Since $H$ is of constant size, the size of $\Phi_{\eta}$ is in $\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{eq}},\left\|\mathinner{\Phi}\right\|_{\text{rat}})$ . All variables in $\Phi_{\eta}$ are of the form $X_{\eta(X)}$ or $\overline{X}_{\eta(X)}$ . The old variables $X\in\mathcal{X}$ are still present but not used in any $\Phi_{\eta}$ . Therefore, inside each $\Phi_{\eta}$ we rename all $X_{\eta(X)}$ by $X$ . After that the variables $X_{g}$ are superfluous: we remove them from $\mathcal{X}$ . Thus, each $\Phi_{\eta}$ uses the same set of $\mathcal{V}_{+}$ as $\Phi$ did.

Each formula $\Phi_{\eta}$ is written again in disjunctive normal form $\Phi_{\eta}=\bigvee\left\{\Phi_{\eta,j}\mathrel{\left|\vphantom{\Phi_{\eta,j}}\vphantom{j\in I_{\eta}}\right.}j\in I_{\eta}\right\}$ where each index set $I_{\eta}$ has again (at most) exponential in the size of $\Phi$ .

Having this, we see that $\Phi$ is equivalent to the following disjunction

[TABLE]

Note that $\Phi$ and $\widetilde{\Phi}$ use the same set $\mathcal{X}_{+}$ of positive variables. It is also clear how to transform a solution $\sigma$ for $\Phi$ into a solution $\widetilde{\sigma}$ for $\widetilde{\Phi}$ and vice versa: if $\sigma$ solves $\Phi$ , then $\widetilde{\sigma}(X_{g})=\sigma(X)g$ solves $\widetilde{\Phi}$ and if $\widetilde{\sigma}$ solves $\widetilde{\Phi}$ , then $\sigma(X)=\widetilde{\sigma}(X_{g})\overline{g}$ solves $\Phi$ .

Since $\left|\mathinner{H}\right|$ is a constant it is easy to see that the number of disjunctions in (63) is (at most) exponential in the size of $\Phi$ . But it can also happen that the size of $\widetilde{\Phi}$ is exponential in the size of $\Phi$ , so in general we have no way to store $\widetilde{\Phi}$ within the given space bound. What we do instead is to construct NFAs $\mathcal{A}_{\eta,j}$ for each $\Phi_{\eta,j}$ , one after another, such that $L(\mathcal{A}_{\eta,j})$ defines the EDT0L relation of the full solution set for $\Phi_{\eta,j}$ .

More precisely, suppose we have shown Thm. 14.2 for each $\Phi_{\eta,j}$ which is a conjunction of constraints and equations. Then indeed, for all $(\eta,j)$ , one after another, we can output some NFA $\mathcal{A}_{\eta,j}$ where the transitions are labeled by endomorphisms over (the same) extended alphabet $C$ such that

[TABLE]

We can also assume that all these NFAs use exactly the same set of distinguished letters $\left\{\mathinner{d_{1},\ldots,d_{k}}\right\}$ . As an output of the overall algorithm we obtain the disjoint union over all these NFAs $\mathcal{A}_{\eta,j}$ . Without restriction $H^{\prime}\cup H_{-}\subseteq C$ , but the elements of $H^{\prime}\cup H_{-}$ are not used in any NFA so far. Moreover, for each $d_{i},\overline{d_{i}}$ we may assume that there are letters $c_{i},\overline{c_{i}}$ , again still not used by any $\mathcal{A}_{\eta,j}$ . We add one more new state and connect this new state with all final states in $\mathcal{A}_{\eta,j}$ via a single transition labeled by the endomorphism $h\in\operatorname{End}(C^{*})$ which is defined by $h(c_{i})=d_{i}g$ where $d_{i}$ corresponds to the variable $X_{i}\in\mathcal{X}_{+}$ and $g=\eta(X_{i})\in H^{\prime}$ . The new state becomes the single final state of the “union” automaton.

We conclude that it is enough to show Thm. 14.2 for each $\Phi_{\eta,j}$ . Since $\Phi_{\eta,j}$ satisfies properties as required by Lem. 14.5, the lemma follows.∎

14.2. Proof of Thm. 14.2, Phase 2.

Embedding into a semi-direct product

Let $E$ be a finite set with involution. Then $\mathbb{F}(E)$ denotes the group $\mathbb{F}(E)=E^{*}/\left\{e\overline{e}=1\mathrel{\left|\vphantom{e\overline{e}=1}\vphantom{e\in E}\right.}e\in E\right\}$ . If the involution on $E$ is without fixed points, then we can write $E=E_{+}\cup E_{-}$ such that $e\in E_{+}\iff\overline{e}\in E_{-}$ ; and the inclusion $E_{+}$ into $E$ induces an isomorphism between the free group $F(E_{+})$ with basis $E_{+}$ and $\mathbb{F}(E)$ . The group $\mathbb{F}({E})$ is called specular in [3], which means it is the free product of a free group with groups of order two.

In the following we use that $G$ is the fundamental group of a finite graph of finite groups [29], which enables us to reduce questions about equations with rational constraints in $G$ to questions about twisted word equations with rational constraints.

Suppose that the group $H$ acts on $E$ via a morphism $H\to\operatorname{Aut}(E)$ . Thus, for $(f,e)\in H\times E$ we have $f(e)\in E$ and $f(\overline{e})=\overline{f(e)}$ . We have $\operatorname{Aut}(E)=\operatorname{Aut}(E^{*})\subseteq\operatorname{Aut}(\mathbb{F}(E))$ and the action of $H$ on $E^{*}$ and $\mathbb{F}(E)$ defines two different (but related) semi-direct products ${E}^{*}\rtimes H$ and $\mathbb{F}({E})\rtimes H$ . The elements of ${E}^{*}\rtimes H$ (resp. $\mathbb{F}({E})\rtimes H$ ) are the pairs $(u,f)\in{E}^{*}\times H$ (resp. $\mathbb{F}({E})\times H$ ) and the multiplication is defined by

[TABLE]

The semi-direct product ${E}^{*}\rtimes H$ is a monoid with involution by

[TABLE]

It is also clear that ${(u,f)}^{-1}=({f}^{-1}({u}^{-1}),{f}^{-1})$ in the group $\mathbb{F}({E})\rtimes H$ .

The free monoid ${E}^{*}$ embeds into ${E}^{*}\rtimes H$ via $e\mapsto(e,1)$ and the group $H$ embeds into ${E}^{*}\rtimes H$ via $f\mapsto(1,f)$ . Having this, we obtain:

[TABLE]

Since we identify $E$ with $E\times\left\{\mathinner{1}\right\}$ and $H$ with $\left\{\mathinner{1}\right\}\times H$ , we can write:

[TABLE]

Thus, $gx\overline{g}=g(x)$ for $g\in H$ , $x\in E^{*}$ , and $x\in\mathbb{F}(E)$ . Let ${\Gamma}=E\cup H$ and $H\cap E^{*}=\left\{\mathinner{1}\right\}$ . Thus, $1$ the identity element in $H$ is identified with the empty word in $E^{*}$ . It also appears as a letter in $\Gamma$ . The interpretation $e\in E$ as $(e,1)$ and $f\in H$ as $(1,f)$ yields canonical surjective morphisms

[TABLE]

Our proof of Thm. 14.2 relies on Prop. 14.7.

Proposition 14.7 ([13], Sec. 2.4.5).

Let $G$ be a finitely generated virtually free group and $\gamma\colon G\to H$ be a homomorphism onto a finite group $H$ such that the kernel $F=\ker(\gamma)$ is free. Then $G$ embeds into a semi-direct product of the form $\mathbb{F}({E})\rtimes H$ ; and we can construct an injective homomorphism $\varphi\colon G\to\mathbb{F}({E})\rtimes H$ and a partition $E=A\cup T$ into two subalphabets such that $F=\left\{x\in G\mathrel{\left|\vphantom{x\in G}\vphantom{\varphi(x)\in\mathbb{F}(E)}\right.}\varphi(x)\in\mathbb{F}(E)\right\}$ is isomorphic to $\mathbb{F}(A)$ . Moreover, using that isomorphism, we can embed $\mathbb{F}(A)$ into $G$ such that $\varphi(a)\in T^{*}aT^{*}$ such that $\varphi(a)$ is freely reduced in $E^{*}$ . The embedding of $G$ into $\mathbb{F}({E})\rtimes H$ is also depicted in Fig. 9.

Proof.

The first assertion is the nontrivial direction in [13, Cor. 2.4.23]. The corollary says that $F=\left\{x\in G\mathrel{\left|\vphantom{x\in G}\vphantom{\varphi(x)\in\mathbb{F}(E)}\right.}\varphi(x)\in\mathbb{F}(E)\right\}$ is a free factor of $\mathbb{F}(E)$ . This means that $\mathbb{F}(E)$ is a free product of $\mathbb{F}(A)$ with $\mathbb{F}(T)$ . The additional property $\varphi(a)\in T^{*}AT^{*}$ for all $a\in A$ is a special case of [13, Prop. 2.4.22]. ∎

Remark 14.8.

As we deal here with non-uniform complexity, we content ourselves to know that the embedding $\varphi\colon G\to\mathbb{F}(E)\rtimes H$ can be effectively computed and therefore we can treat $\left|\mathinner{E}\right|$ as a constant. But in fact the proof of Lemma 30 in the arXiv version of [57] shows

[TABLE]

Thus, if $G$ is given to us as the fundamental group of a finite graph of finite groups, then the interested reader could derive uniform complexity bounds from the material presented here.

Corollary 14.9.

We use the same notation as in Prop. 14.7. Define a morphism $\psi$ from $\mathbb{F}(E)$ onto $\mathbb{F}(A)$ by $\psi(a)=a$ and $\psi(t)=1$ for $a\in A$ and $t\in E\setminus A$ . Then $\psi$ maps freely reduced words $\widehat{w}\in\varphi(A^{*})$ to freely reduced words in $A^{*}$ .

Proof.

The subgroup $\varphi(\mathbb{F}(A))$ of $\mathbb{F}(E)$ is generated by words of the form $\varphi(v)$ where $v\in A^{*}$ is freely reduced over $A$ . Thus, we can write every element in $w\in\varphi(\mathbb{F}(A))$ as a word

[TABLE]

such that $a_{i}\neq\overline{a_{i+1}}$ for $1\leqslant i<m$ . Now, every freely reduced word $\widehat{w}$ can be obtained from some word $w$ as above by cancellation of factors $e\overline{e}$ . Since $a_{i}\neq\overline{a_{i+1}}$ for $1\leqslant i<m$ , we obtain

[TABLE]

In particular, $\psi(\widehat{w})=a_{1}\cdots a_{m}$ , and $a_{1}\cdots a_{m}$ is freely reduced by definition. ∎

Remark 14.10.

Let us give a few more comments how Prop. 14.7 and Cor. 14.9 are shown in [13]. Since $G$ is a fundamental group of a finite graph of finite groups, it acts on its Bass-Serre tree $\mathcal{T}$ without edge inversion [58]. As the notation suggests, $\mathcal{T}$ is indeed a tree: a connected acyclic undirected graph. The same is true for the free subgroup $F=\ker(\gamma)$ of $G$ : it acts on $\mathcal{T}$ as a graph automorphism without edge inversion. It follows that $F$ has trivial intersection with all vertex groups because the vertex groups are finite and embed into $G$ , see again [58]. So, if the intersection was not trivial, the $G$ would have a finite nontrivial subgroup, but free groups are torsion free. Thus, $F$ acts on $\mathcal{G}$ without vertex stabilizers and without without edge inversion.

Now, let $\mathcal{G}$ be the quotient graph $\mathcal{G}=F\setminus\mathcal{T}$ . The finite group $H$ acts on $\mathcal{G}$ : it permutes the edges and vertices of $\mathcal{G}$ by respecting the incidence relation. Moreover, $F$ appears as the fundamental group of the finite and connected simplicial graph $\mathcal{G}$ . This can be viewed as the main structure theorem about groups acting on trees, [58, Thm. 13]. That is, we can write $F=\pi_{1}(\mathcal{G})$ . The point is that we always have two views on fundamental groups of a simplicial graph. The first view is to choose a base point $\star$ and we write $\pi_{1}(\mathcal{G})=\pi_{1}(\mathcal{G},\star)$ where $\pi_{1}(\mathcal{G},\star)$ is a set of paths from $\star$ to $\star$ . The second view is to choose a spanning tree $T$ of $\mathcal{G}$ and we realize $\pi_{1}(\mathcal{G})$ as $\pi_{1}(\mathcal{G})=\pi_{1}(\mathcal{G},T)$ . That is $\pi_{1}(\mathcal{G})=\pi_{1}(\mathcal{G},T)=\mathbb{F}(E)/\left\{t=1\mathrel{\left|\vphantom{t=1}\vphantom{t\in T}\right.}t\in T\right\}$ . The isomorphism between $\pi_{1}(\mathcal{G},\star)$ and $\pi_{1}(\mathcal{G},T)$ is induced by the inclusion of $\pi_{1}(\mathcal{G},\star)$ into $\mathbb{F}(E)$ followed by the projection $\mathbb{F}(E)$ onto the quotient $\pi_{1}(\mathcal{G},T)=\mathbb{F}(E)/\left\{t=1\mathrel{\left|\vphantom{t=1}\vphantom{t\in T}\right.}t\in T\right\}$ .

Let ${E}$ be the set of edges in $\mathcal{G}$ . It is (in the sense of [58]) a finite alphabet: a finite set with involution without fixed points. Thus, we can view $E$ as a disjoint union $E_{+}\cup E_{-}$ with $e\in E_{+}\iff\overline{e}\in E_{-}$ . As a set we identify $\mathbb{F}(E)=F({E_{+}})={E}^{*}/\left\{e\overline{e}=1\mathrel{\left|\vphantom{e\overline{e}=1}\vphantom{e\in{E}}\right.}e\in{E}\right\}$ with the regular set of reduced words in $E^{*}$ . Recall that a word is reduced if and only if no factor $e\overline{e}$ for $e\in E$ appears. Since $H$ acts on the graph $\mathcal{G}$ , each $g\in H$ acts on $E^{*}$ via a length preserving automorphism which respects the involution. Hence, $w$ is reduced if and only if $g(w)$ is reduced.

14.3. Proof of Thm. 14.2, Phase 3.

Transformation of $\Phi$ to $\Psi$

Let $\Phi$ be the input formula for showing Thm. 14.2. By Lem. 14.5 we may assume that $\Phi=\Phi_{\eta,j}$ where $\Phi_{\eta,j}$ appears in (63). Thus, $\Phi$ is given as a single conjunction of a special form where every variable is bounded by a constraint $X\in\mathbb{F}(A)$ . It follows that the choice of $H^{\prime}$ for the standard generating set doesn’t effect ${\mathrm{Sol}}_{S,k}(\Phi)$ from this point on. Therefore, we can write

[TABLE]

Next, we use the embedding of $G$ into the semi-direct product $\mathbb{F}(E)\rtimes H$ as given by Prop. 14.7 and Fig. 9. We are going to transform $\Phi$ into a formula $\Psi$ over $\mathbb{F}(E)\rtimes H$ such that the inclusion of $G$ into the semi-direct product defines a bijection between $\mathrm{Sol}(\Phi)$ and $\mathrm{Sol}(\Psi)$ .

We construct $\Psi$ according to the following steps.

(1)

We extend the embedding $\varphi\colon G\to\mathbb{F}(E)\rtimes H$ to an embedding $\varphi\star\mathrm{id}_{\mathcal{X}}\colon G\star\mathcal{X}^{*}\to(\mathbb{F}(E)\rtimes H)\star\mathcal{X}^{*}$ and we replace every equation $U=V$ in $\Phi$ by $\varphi\star\mathrm{id}_{\mathcal{X}}(U)=\varphi\star\mathrm{id}_{\mathcal{X}}(V)$ . Identifying $E$ , $H$ , and $\mathcal{X}$ with subsets of $(\mathbb{F}(E)\rtimes H)\star\mathcal{X}^{*}$ , we see that $E\cup H\cup\mathcal{X}$ generates the group $(\mathbb{F}(E)\rtimes H)\star\mathcal{X}^{*}$ . Hence, every equation $\varphi\star\mathrm{id}_{\mathcal{X}}(U)=\varphi\star\mathrm{id}_{\mathcal{X}}(V)$ can be written as a plain equation over the alphabet $E\cup H\cup\mathcal{X}$ . As we have defined $\Gamma=E\times\left\{\mathinner{1}\right\}\cup\left\{\mathinner{1}\right\}\times H=E\cup H$ , we have $E\cup H\cup\mathcal{X}=\Gamma\cup\mathcal{X}$ . 2. (2)

We replace every $\mathcal{A}$ which appears in $\Phi$ by $\mathcal{A}_{1}=\varphi(\mathcal{A})$ . That is $L(\mathcal{A}_{1})=\varphi(L(\mathcal{A}))$ . By assumption we have $L(\mathcal{A}_{1})\subseteq\mathbb{F}(E)$ . Hence, $L(\mathcal{A}_{1})$ is a rational subset of the free group $\mathbb{F}(E)$ .

Note that $\left\|\mathinner{\mathcal{A}_{1}}\right\|\in\mathcal{O}(\left\|\mathinner{\mathcal{A}}\right\|)$ . Without restriction, we may assume that the transitions in $\mathcal{A}_{1}$ are labeled by elements from $\Gamma$ . The property $\left\|\mathinner{\mathcal{A}_{1}}\right\|\in\mathcal{O}(\left\|\mathinner{\mathcal{A}}\right\|)$ is not effected by that assumption. Let $\Psi_{1}$ be the intermediate formula. It is clear that $\varphi$ induces a bijection between $\mathrm{Sol}(\Phi)$ and $\mathrm{Sol}(\Psi_{1})$ . 3. (3)

We transform each $\mathcal{A}_{1}$ appearing in $\Psi_{1}$ into an NFA $\mathcal{B}$ such that first, $L(\mathcal{A}_{1})=L(\mathcal{B})\subseteq\mathbb{F}(E)$ and second, the transitions $\mathcal{B}$ use labels from $E\cup\left\{\mathinner{1}\right\}$ , and third $\left\|\mathinner{L(\mathcal{B})}\right\|\in\mathcal{O}(\left\|\mathinner{L(\mathcal{A}_{1})}\right\|)$ . This well known by [56, 59], but not completely obvious. In Lem. 14.11 we give a slightly simplified proof for the special situation of semi-direct products.

Let $\Psi$ be the corresponding formula. Since $L(\mathcal{A}_{1})=L(\mathcal{B})$ , we have $\mathrm{Sol}(\Psi)=\mathrm{Sol}(\Psi_{1})$ .

The construction of $\Psi$ is finished. We have

[TABLE]

Lemma 14.11 ([56, 59]).

Let $\mathcal{A}=(Q,\Gamma,\delta,\mathscr{I},\mathscr{F})$ be an NFA with the property $L(\mathcal{A})\subseteq\mathbb{F}(E)$ . Then there is an NFA $\mathcal{B}$ such that $L(\mathcal{A})=L(\mathcal{B})$ where the transitions $\mathcal{B}$ use labels from $E\cup\left\{\mathinner{1}\right\}$ , and $\left\|\mathinner{L(\mathcal{B})}\right\|_{\text{in}}\in\mathcal{O}(\left\|\mathinner{L(\mathcal{A})}\right\|_{\text{in}})$ .

Proof.

In the beginning we let $\delta$ be any finite subset of $Q\times\Gamma^{*}\times Q$ . By Sect. 13.1 and perhaps by doubling the size of $\mathcal{A}$ , we may assume that $\delta\subseteq Q\times(\Gamma\cup\left\{\mathinner{1}\right\})\times Q$ . Since $\Gamma=E\cup H$ and $1\in H$ we may assume that $\delta\subseteq Q\times H(E\cup\left\{\mathinner{1}\right\})\times Q$ . Thus, the label of every transition is either an element from $H$ or from $E\cup\left\{\mathinner{1}\right\}$ or a product $ha$ where $h\in H$ and $a\in E$ . Moreover, we may assume that $\mathcal{A}$ is trim. In particular, if we reach a state $p$ when reading a word $u$ from an initial state, then there is a word $v$ such that $uv\in L(\mathcal{A})$ . Now, $L(\mathcal{A})\subseteq\mathbb{F}(E)$ . We let $\gamma(p)=\gamma(u)$ . This is well-defined as $\gamma(u){\gamma(v)}^{-1}=1$ .

For every state $p$ with $\gamma(p)=g$ we introduce exactly one more state $[p,g]$ and transitions $p\overset{\overline{g}}{\longrightarrow}[p,g]$ and $[p,g]\overset{g}{\longrightarrow}p.$ This does not change the language accepted, and the NFA is still trim with $\gamma([p,g])=1$ . For each outgoing transition $p\overset{ha}{\longrightarrow}q$ with $h\in H$ and $a\in E\cup\left\{\mathinner{1}\right\}$ we have $\gamma(q)=f=gh$ ; and there is some $b\in E\cup\left\{\mathinner{1}\right\}$ such that $bf=fa$ in $G$ and hence, we add a transition $[p,g]\overset{b}{\longrightarrow}[q,f]$ as depicted in Fig. 10.

This doesn’t change the language accepted as $\overline{g}bf=ha$ in $G$ . The larger NFA still accepts $L$ , but the crucial point is that for $h_{1}a_{1}\cdots h_{k}a_{k}\in L(A)$ we can accept the same element in $G$ by reading just labels from $E\cup\left\{\mathinner{1}\right\}$ . This is easy to see by induction on $k$ . Now, we remove all original states (they are no longer needed) and make $[p,1]$ initial (resp. final) if and only if $p$ was initial (resp. final), to obtain the NFA $\mathcal{B}$ . By construction, we have $\left\|\mathinner{\mathcal{B}}\right\|_{\text{in},\Gamma}\leqslant 2\left\|\mathinner{\mathcal{A}}\right\|_{\text{in},\Gamma}$ . This implies $\left\|\mathinner{\mathcal{B}}\right\|_{\text{in}}\in\mathcal{O}(\left\|\mathinner{\mathcal{A}}\right\|_{\text{in}})$ . Recall that $\left\|\mathinner{\mathcal{A}}\right\|_{\text{in}}$ is well-defined up to a multiplicative constant only by (54). This makes $\left\|\mathinner{\mathcal{A}}\right\|_{\text{in}}$ independent of the choice of a finite generating set. ∎

14.4. Proof of Thm. 14.2, Phase 4.

From $\Psi$ to $\Psi_{\text{Ben}}$ : applying the techniques of Benois

The transformation in this subsection doesn’t effect the equations in $\Psi$ . We only change the NFAs $\mathcal{B}$ such that they accept with every word $w\in E^{*}$ also the word $\widehat{w}$ which is obtained by canceling all factors $e\overline{e}$ . Nevertheless the rational subset $\pi_{E}(L(\mathcal{B}))\subseteq\mathbb{F}(E)$ will not change. The techniques for the transformation is well known by the work of Michèle Benois [2]. Therefore we call the new formula $\Psi_{\text{Ben}}$ . We will see that $\pi_{E}(\mathrm{Sol}_{E,k}(\Psi))=\pi_{E}(\mathrm{Sol}_{E,k}(\Psi_{\text{Ben}}))$ . For convenience of the reader, we explain the transformation in detail. We use notation from string rewriting.

For $u,v\in E^{*}$ we write $u\overset{}{\underset{}{\Rightarrow}}v$ if $u=pq$ and $v=pe\overline{e}q$ for some $p,q\in E^{*}$ and $e\in E$ . By $\overset{*}{\underset{}{\Rightarrow}}$ we mean the reflexive and transitive closure of $\overset{}{\underset{}{\Rightarrow}}$ . Clearly, $u\overset{*}{\underset{}{\Rightarrow}}v$ implies $\pi_{E}(u)=\pi_{E}(v)$ . Moreover, $\pi_{E}(u)=\pi_{E}(v)$ implies $u\overset{*}{\underset{}{\Rightarrow}}w\overset{*}{\underset{}{\Leftarrow}}v$ for some $w\in E^{*}$ .

By $\mathbb{F}$ we denote the set of freely reduced words over $E$ ; and we identify $\mathbb{F}(E)$ with the regular set $\mathbb{F}\subseteq E^{*}$ . These a words without any factor $e\overline{e}$ where $e\in E$ or, equivalently, the set of words $u$ such that $u\overset{*}{\underset{}{\Rightarrow}}v$ implies $u=v$ . The identification as sets is possible because $\pi_{E}$ yields a bijection from $\mathbb{F}$ onto $\mathbb{F}(E)$ .

The formula $\Psi$ uses NFAs $\mathcal{B}$ where the transitions are labeled by letters from $E$ or by the empty word $1$ , see Sect. 14.3. The interpretation so far is that $L(\mathcal{B})$ denotes a rational subset in $\mathbb{F}(E)$ . Now, we switch the viewpoint: $L(\mathcal{B})$ denotes a regular subset in $E^{*}$ ; and we replace all constraints $X\in\pi_{E}L(\mathcal{B})$ (resp. $X\notin\pi_{E}L(\mathcal{B})$ ) by $X\in L(\mathcal{B})$ (resp. $X\notin L(\mathcal{B})$ ). This is nothing but a change of notation, so we call the new formula still $\Psi$ . However, we now on we consider the full solution set $\mathrm{Sol}_{E,k}(\Psi)$ as a relation over $E^{*}$ . Thus, a solution $\sigma$ is given as a morphism $\sigma\colon\mathcal{X}\to E^{*}$ . Recall that the “actual” solution over the group $\mathbb{F}(E)\rtimes H$ is therefore given by $\pi_{E}\sigma\colon\mathcal{X}\to\mathbb{F}(E)$ .

Lemma 14.12 ([2]).

Let $\mathcal{B}=(Q,E,\delta;\mathscr{I},\mathscr{F})$ an NFA which appears in $\Psi$ . Thus, $\delta\subseteq Q\times(E\cup\left\{\mathinner{1}\right\})\times Q$ and $L(\mathcal{B})\subseteq E^{*}$ . Then we can transform $\mathcal{B}$ into an NFA $\mathcal{B}^{\prime}=(Q^{\prime},E,\delta^{\prime};\mathscr{I}^{\prime},\mathscr{F}^{\prime})$ such that $\left|\mathinner{Q^{\prime}}\right|=\left|\mathinner{Q}\right|$ , $\delta^{\prime}\subseteq Q^{\prime}\times(E\cup\left\{\mathinner{1}\right\})\times Q^{\prime}$ , and

[TABLE]

Proof.

Let $\mathcal{B}=(Q,E,\delta,\mathscr{I},\mathscr{F})$ an NFA over $E^{*}$ where $\delta\subseteq Q\times(E\cup\left\{\mathinner{1}\right\})\times Q$ . We run the following while-loop.

While there there are a letter $e\in E$ and states $s,t\in Q$ such that $(s,1,t)\notin\delta$ but $e\overline{e}\in L(\mathcal{B},s,t)$ enlarge $\delta$ by the $\varepsilon$ -transition $(s,1,t)$ .

The while-loop terminates after at most $\left|\mathinner{Q}\right|^{2}$ rounds with the desired NFA $\mathcal{B}^{\prime}$ . The number of states is same as before.

The inclusion $L(\mathcal{B}^{\prime})\subseteq\left\{v\in E^{*}\mathrel{\left|\vphantom{v\in E^{*}}\vphantom{u\overset{*}{\underset{}{\Rightarrow}}v\wedge u\in L(\mathcal{B})}\right.}u\overset{*}{\underset{}{\Rightarrow}}v\wedge u\in L(\mathcal{B})\right\}$ is trivial. The converse follows by induction on the length of $u$ . Moreover, for each $u\in E^{*}$ there is a (unique) $\widehat{u}\in\mathbb{F}$ such that $u\overset{*}{\underset{}{\Rightarrow}}\widehat{u}\in\mathbb{F}$ . This shows (70) and hence the lemma. ∎

Let us define a new formula $\Psi_{\text{Ben}}$ in two steps:

(1)

Every constraint $X\in\pi_{E}(L(\mathcal{B}))$ (resp. $X\notin\pi_{E}(L(\mathcal{B}))$ is replaced by $X\in L(\mathcal{B}^{\prime})$ (resp. $X\notin L(\mathcal{B}^{\prime})$ where $\mathcal{B}^{\prime}$ is the NFA constructed in Lem. 14.12. Let $\Psi$ be the new formula. 2. (2)

Define $\Psi_{\text{Ben}}$ by

[TABLE]

Lemma 14.13.

*Let $\Phi$ satisfy the properties in Lem. 14.5. Then the embedding $\varphi\colon G\to\mathbb{F}(E)\rtimes H$ induces a bijection between $\pi_{A}({\mathrm{Sol}}_{A,k}(\Phi))\subseteq\mathbb{F}(A)^{k}$ and

$\pi_{E}({\mathrm{Sol}}_{E,k}(\Psi_{\text{Ben}}))\subseteq\mathbb{F}(E)^{k}$ . Moreover, $(\left\|\mathinner{\Psi_{\text{Ben}}}\right\|_{\text{eq}},\left\|\mathinner{\Psi_{\text{Ben}}}\right\|_{\text{rat}})\in\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{eq}},\left\|\mathinner{\Phi}\right\|_{\text{rat}})$ .*

Proof.

The proof is immediate by (69), (70), and the construction of $\Psi_{\text{Ben}}$ which makes sure that all variables satisfy the constraint $X\in\mathbb{F}$ . Thus a constraint $X\in L(\mathcal{B}^{\prime})$ is equivalent to a constraint $X\in L(\mathcal{B}^{\prime})\cap\mathbb{F}$ and a constraint $X\notin L(\mathcal{B}^{\prime})$ is equivalent to a constraint $X\in\mathbb{F}\setminus L(\mathcal{B}^{\prime})$ . ∎

14.5. Proof of Thm. 14.2, Phase 5.

Switching from NFAs to finite monoids: From $\Psi_{\text{Ben}}$ to $\Psi_{\text{mon}}$

The goal is to reduce the proof Thm. 14.2 to Thm. 4.3. This requires that we represent regular constraints by recognizing morphisms. In the following a guess means to run deterministically over all possibilities. That is, there is deterministic transducer which respects the space bound in Thm. 14.2 and produces all possible outputs one after another. The corresponding EDT0L relations are calculated separately and then everything is put together as we did when we split $\widetilde{\Phi}$ into formulae $\Phi_{\eta,j}$ in (63).

Let $L(\mathcal{B}_{1}),\ldots,L(\mathcal{B}_{\ell})$ be the list of NFAs which appear in $\Psi_{\text{Ben}}$ . We have $\ell\geqslant 1$ and without restriction $L(\mathcal{B}_{\ell})=\mathbb{F}$ . According to Ex. 2.1 there is a morphism $\mu_{\ell}\colon E^{*}\to N_{\ell}$ to an $H$ -monoid $N_{\ell}$ of size $2+\left|\mathinner{E}\right|^{2}-\left|\mathinner{E}\right|$ such that $u\in\mathbb{F}\iff\mu_{\ell}(u)\in F_{\ell}$ where $F_{\ell}=\mu_{\ell}(\mathbb{F})$ . Since $\left|\mathinner{E}\right|\in\mathcal{O}(1)$ , the monoid $N_{\ell}$ is of constant size. For the other constraints we cannot expect such a small recognizing monoid, and we use Boolean matrices instead. For $1\leqslant i<\ell$ let $q_{i}$ be the number of states of the NFA $\mathcal{B}_{i}$ . According to Sect. 2.6 and Ex. 3.1 in Sect. 3 we find for each $1\leqslant i<\ell$ a morphism to a morphism $\mu_{i}\colon E^{*}\to N_{i}$ to a monoid with involution $N_{i}$ of size $4^{q_{i}^{2}}$ such that $u\in L(\mathcal{B}_{i})\iff\mu_{\ell}(u)\in F_{i}$ where $F_{i}=\mu_{i}(\mathbb{F}\setminus L(\mathcal{B}_{i}))$ for a negative constraint and $F_{i}=\mu_{i}(L(\mathcal{B}_{i}))$ for a positive constraint. Recall that the monoid $N_{i}$ is a submonoid of $\mathbb{B}^{2n\times 2n}$ . Let $N$ be the direct product $N_{0}=N_{1}\times\cdots\times N_{\ell}$ . Let $\pi_{i}\colon N_{0}\to N_{i}$ the canonical projection, Then we obtain a single morphism $\mu\colon E^{*}\to N_{0}$ such that $\mu_{i}=\pi_{i}\mu$ for all $1\leqslant i<\ell$ .

Now, for each $X\in\mathcal{X}$ we guess a value $\nu(X)\in N_{0}$ . Each time we make a guess $\nu(X)\in N_{0}$ we check that it is consistent with the constraints. Thus, for each $0\leqslant i\leqslant\ell$ and each $X\in\mathcal{X}$ we do the following. If there there a positive constraint $X\in\mathcal{B}_{i}$ , then we check $\pi_{i}\nu(X)\in\mu_{i}(L(\mathcal{B}_{i}))$ . If there is a negative constraint $X\notin\mathcal{B}_{i}$ , then we check $\pi_{i}\nu(X)\notin\mu_{i}(L(\mathcal{B}_{i}))$ . If the guess is not consistent, then the guess is not successful and the corresponding output is empty.

For a consistent guess $\nu$ we define the following formula

[TABLE]

Here, $\left\{U_{j}=V_{j}\mathrel{\left|\vphantom{U_{j}=V_{j}}\vphantom{j\in J}\right.}j\in J\right\}$ is the set of equations which appear in the conjunction $\Psi_{\text{Ben}}$ . By a slight abuse of language we call a conjunction as in (72) still a Boolean formula. It is clear what we mean by a solution of $\Psi_{\text{mon}}$ , it is given by morphism $\sigma\colon\mathcal{X}\to E^{*}$ such that

(1)

$\pi_{E}\sigma(U_{j})=\pi_{E}\sigma(V_{j})\in\mathbb{F}(E)$ for all $j\in J$ . 2. (2)

$\mu\sigma(X)=\nu(X)$ for all $X\in\mathcal{X}$ .

For an inconsistent guess we let $\Psi_{\text{mon},\nu}=\bot$ . Using this interpretation we have

[TABLE]

The size of the finite monoid $N_{0}$ is in $2^{\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{rat}}^{2})}$ . Thus, in general we cannot store the disjunction over all guesses in $\mathsf{PSPACE}$ . So, we produce the required NFAs for each $\Psi_{\text{mon},\nu}$ again one after another. We are approaching our goal prove Thm. 14.2. For that we use the following proposition.

Proposition 14.14.

Let $\Phi$ satisfy all conditions in Lem. 14.5. Then there is an $\mathsf{NSPACE}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2}(\left\|\mathinner{\Phi}\right\|_{\text{rat}}^{2}+\log\left\|\mathinner{\Phi}\right\|_{\text{eq}}))$ algorithm which performs the following task. It takes as input a Boolean formula

[TABLE]

which appears as $\Psi_{\text{mon},\nu}$ in (72), The output is an extended alphabet $C$ of size $\mathcal{O}({\left\|\mathinner{\Phi}\right\|}^{2})$ , letters $d_{i}\in C$ for all $1\leqslant i\leqslant k$ , and a trim NFA ${\mathcal{A}}_{\Psi_{\text{mon}}}$ accepting a rational set of $S$ -morphisms over $C^{*}$ such that the EDT0L relation

[TABLE]

is equal to the full solution set in freely reduced words

[TABLE]

Moreover, ${\mathrm{Sol}}_{E,k}(\Psi_{\text{mon}})=\emptyset$ if and only if $L({\mathcal{A}}_{\Psi_{\text{mon}}})=\emptyset$ ; and $\left|\mathinner{{\mathrm{Sol}}_{E,k}(\Psi_{\text{mon}})}\right|<\infty$ if and only if ${\mathcal{A}}_{\Psi_{\text{mon}}}$ doesn’t contain any directed cycle.

14.6. Proof of Prop. 14.14

In the proof of Prop. 14.14 we wish to apply Thm. 4.3. In order to do so, we define another parameter $m(\Phi)$ by the following equation.

[TABLE]

Recall that $\log(m)\geqslant 1$ by the definition in Sect. 2.1. Therefore we have:

[TABLE]

Hence, for the proof we can use the space bound $\mathsf{NSPACE}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2}m(\Phi)\log\left\|\mathinner{\Phi}\right\|_{\text{eq}})$ which is in the form we need for Thm. 4.3.

14.6.1. From $\Psi_{\text{mon}}$ to the system $\mathcal{S}_{\Phi}$

Let $\Psi_{\text{mon}}$ be written as in (74). Then we define $\mathcal{S}^{\prime}$ to be the following system of equations (without any constraint)

[TABLE]

Recall that we have defined morphisms $\mu\colon E^{*}\to N_{0}$ and $\nu:\mathcal{X}\to N_{0}$ . We join $\mu$ and $\nu$ to a morphism $\mu_{0}\colon(E\cup\mathcal{X})^{*}\to N_{0}$ by letting $\mu_{0}(e)=\mu(e)$ for $e\in E$ and $\mu_{0}(X)=\nu(X)$ for $X\in\mathcal{X}$ .

The group $H$ acts on $E$ , but neither on $\mathcal{X}$ nor on $N_{0}$ . Therefore we perform two steps. First, we embed the set of variables $\mathcal{X}$ into a larger set of twisted variables

[TABLE]

In order to have an embedding we identify $Z\in\mathcal{X}$ with $(1,Z)\in\mathcal{Y}$ .

The group $H$ acts freely without fixed points on $\mathcal{Y}$ by $g\cdot(f,X)=(fg,X)$ and $\overline{(f,X)}=(f,\overline{X})$ . In this way every morphism $\sigma\colon\mathcal{X}\to E^{*}$ extends uniquely to an $H$ -compatible morphism $\sigma\colon\mathcal{Y}^{*}\to E^{*}$ .

Second, we embed $N_{0}$ into a larger $H$ -monoid $N$ as constructed in Sect. 3.2. Moreover, using the universal property of the $H$ -monoid $N$ , we extend $\mu_{0}:(E\cup\mathcal{X})^{*}\to N_{0}$ uniquely to a morphism $\mu_{N}:(E\cup\mathcal{Y})^{*}\to N$ of $H$ -monoids by $\mu_{N}(f,Z)=f(\mu_{0}(Z))$ . Twisted variable of the form $(f,Z)$ appear in $\mathcal{S}^{\prime}$ only if $f=1$ , but formally the set of variables is now $\mathcal{Y}=H\times\mathcal{X}$ and for each variable $Y\in\mathcal{Y}$ the value $\mu_{N}(Y)\in N$ is defined. The morphism $\mu_{N}$ is respects the involution and the action of $H$ , so does every solution $\sigma\colon\mathcal{Y}^{*}\to E^{*}$ .

We define the system $\mathcal{S}_{\Phi}$ by the system $\mathcal{S}^{\prime}$ with the set of variables $\mathcal{Y}$ and where for each $Y\in\mathcal{Y}$ there is a constraint $\mu_{N}(Y)\in N$ .

14.6.2. Triangulation: From $\mathcal{S}_{\Phi}$ to $\mathcal{S}_{\text{tri}}$

The “problem” with the system $\mathcal{S}_{\Phi}$ is that equations $U=V$ are written as words $U,V\in(\Gamma\cup\mathcal{Y})^{*}$ where $\Gamma=E\cup H\subseteq\mathbb{F}(E)\rtimes H$ . In a twisted word equation the $U$ and $V$ should be words over $E\cup\mathcal{Y}$ .

Now, let $W\in(\Gamma\cup\mathcal{Y})^{*}$ be any word. We intend to move letters $g\in H$ to the right. If $W$ contains a factor $fg$ with $f,g\in H$ , then we replace $fg$ by the letter $h$ if $fg=h$ in $H$ . Whenever we see a letter $h=1\in H$ , then we remove it. If $W$ contains a factor $ga$ where $a\in E$ and $g\in H$ , then we replace it by $bg$ where $b\in E$ corresponds to the letter $ga\overline{g}\in E$ according to (65). Since $b$ is letter $g$ moves to the right without increasing the length of $W$ . The last rule is that we replace every factor $g(f,Y)$ with $Y\in\mathcal{X}$ and $g\in H^{\prime}$ by $(gf,Y)g$ . Again, $g$ moves to the right without increasing the length. Thanks to this rule twisted variables other than $(1,Z)$ appear in the equations. Thus, every $U=V$ with $U,V\in(\Gamma\cup\mathcal{Y})^{*}$ can be written as $U^{\prime}f=V^{\prime}g$ such that $U^{\prime}V^{\prime}\in(E\cup\mathcal{Y})^{*}$ , $f,g\in H$ , and $|U^{\prime}V^{\prime}|\leqslant|UV|$ .

Moreover, if $\sigma\colon\mathcal{Y}\to E^{*}$ is any morphism, then $\sigma(U)=\sigma(U^{\prime})f$ and $\sigma(U)=\sigma(U^{\prime})g$ . Since $\sigma(U^{\prime})\in E^{*}$ and $\sigma(V^{\prime})\in E^{*}$ , we have $\pi_{E}\sigma(U)=\pi_{E}\sigma(V)$ only if $f=g$ . Thus, whenever we find $f\neq g$ , then we can stop: there is no solution. On the other hand, for $f=g$ we have $\pi_{E}\sigma(U)=\pi_{E}\sigma(V)\iff\pi_{E}\sigma(U^{\prime})=\pi_{E}\sigma(V^{\prime})$ . Hence, we can replace the equation $U=V$ by $U^{\prime}=V^{\prime}$ and $U^{\prime}=V^{\prime}$ is a twisted word equation over $E$ in twisted variables $\mathcal{Y}=H\times\mathcal{X}$ and with regular constraints defined by an compatible morphism $\mathcal{Y}\to N$ .

Using standard techniques as described in Sect. 5.4 we can assume without restriction that all equations are in triangular form. The number of fresh (twisted) variables and the increase of the length over all equations is thereby bounded by $\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{eq}})$ . The set of variables is still called $\mathcal{X}$ and the set twisted variables is still called $\mathcal{Y}=H\times\mathcal{X}$ . Thus after the modifications above and triangulation we obtain the system of twisted word equation with regular constraints $\mathcal{S}_{\text{tri}}$ . We have $\varphi(\mathrm{Sol}_{A,k}(\Phi))=\mathrm{Sol}_{E,k}(\mathcal{S}_{\text{tri}})$ and all equations $\mathrm{Sol}_{E,k}(\mathcal{S}_{\text{tri}})$ have the form

[TABLE]

where $Z\in\mathcal{X}$ is a variable, $f,g\in H$ , and $x,y\in E\cup\mathcal{Y}$ . For example, we could have $x=e\in E$ and $y=(h,Y)\in\mathcal{Y}$ . Then the equation becomes $(1,Z)=f(e)g((h,Y))=e^{\prime}(gh,Y)$ . The triangular form is convenient to achieve the property that solutions are in freely reduced words.

14.6.3. From $\mathcal{S}_{\text{tri}}$ to $\mathcal{S}_{\text{fin}}$ : solutions in freely reduced words

This subsection mimics [10] in the context of virtually free groups. $\mathcal{S}_{\text{fin}}$ will be the “final” system in the sequence of transformations. Recall that $\mathbb{F}\subseteq E^{*}$ denotes the regular subset of freely reduced words. Clearly, if $puq\in\mathbb{F}$ and $f\in H$ , then $f(u)\in\mathbb{F}$ , too. Another crucial observation is that for all freely reduced words $x,y,z\in\mathbb{F}$ and $f,g\in H$ we have $z=f(x)g(y)$ in $\mathbb{F}(E)$ if and only if there are freely reduced words $p,q,r$ such that

[TABLE]

According to (77) every equation in the triangular system $\mathcal{S}_{\text{tri}}$ has the form $(1,Z)=f(x)g(y)$ . For each such equation and each $h\in H$ we introduce six fresh twisted variables

[TABLE]

After that we replace the equation $(1,Z)=f(x)g(y)$ by the conjunction of three new equations:

[TABLE]

For simplicity, the new set of twisted variables is still called $\mathcal{Y}=H\times\mathcal{X}$ .

We obtain a system $\mathcal{S}_{\text{fin}}$ , and this finishes the construction of the new formula $\mathcal{S}_{\text{fin}}$ . Let $\sigma\colon\mathcal{X}\to E^{*}$ be any compatible morphism such that $\pi_{E}\sigma(1,Z)=\pi_{E}\sigma(f(x)g(y))$ . Then we there is some $\sigma^{\prime}:\mathcal{X}\to\mathbb{F}$ such that first, $\pi_{E}\sigma=\pi_{E}\sigma^{\prime}$ , and second $\sigma^{\prime}$ solves the three equation in (78) in freely reduced words. That is $\sigma^{\prime}$ solves the three equation under the constraint $(h,Y)\in\mathbb{F}$ for all $(h,Y)$ . For the other direction: if $\pi_{E}\sigma$ solves the three equations in $\mathbb{F}(E)$ without any constraint on twisted variables, then there is some $\sigma^{\prime}\colon\mathcal{X}\to\mathbb{F}$ such that $\pi_{E}\sigma^{\prime}$ solves the equation $(1,Z)=f(x)g(y)$ in $\mathbb{F}(E)$ . The remaining problem is that our formalism asks to define values $\mu_{N}$ for each new variables. (That is $2/3$ of all variables). The only way to do so in the given space bound is to guess the correct value. We can write the equations appearing in $\mathcal{S}_{\text{fin}}$ as a system

[TABLE]

where $x_{j}\in E\cup\mathcal{X}$ and $\mu_{N}(x_{j})$ is already fixed. We can read this system as a system of equations over the finite monoid $N_{M}$ . To check whether such systems have a solution is actually $\mathsf{PSPACE}$ hard, however we don’t need $\mathsf{PSPACE}$ -hardness. It is enough that within our space bound we can output by guessing-and-checking all possibilities to assign $\mu_{N}$ values to each of the fresh twisted variables. Such an assignment is again a tuple $\nu\in N_{M}^{\mathcal{X}}$ . Formally we can write a $\mathcal{S}_{\text{fin}}$ as a disjunction

[TABLE]

Some of the systems $\mathcal{S}_{\text{fin},\nu}$ can be empty. If all are empty, then we can stop: $\mathcal{S}_{\text{fin}}$ is not solvable.

Lemma 14.15.

Let $\Phi$ satisfy all conditions in Lem. 14.5. Then there is an $\mathsf{NSPACE}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2}(\left\|\mathinner{\Phi}\right\|_{\text{rat}}^{2}+\log\left\|\mathinner{\Phi}\right\|_{\text{eq}}))$ algorithm which performs the following task. It takes as input a Boolean formula non-empty system $\mathcal{S}_{\text{fin},\nu}$ from the disjunction in (80).

The output is an extended alphabet $C$ of size $\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2})$ with $E\subseteq C$ , letters $d_{i}\in C$ for all $1\leqslant i\leqslant k$ , and a trim NFA ${\mathcal{A}}_{\mathcal{S}_{\text{fin},\nu}}$ accepting a rational set of $E$ -morphisms over $C^{*}$ such that the EDT0L relation

[TABLE]

is equal to the full solution set in freely reduced words

[TABLE]

Moreover, ${\mathrm{Sol}}_{E,k}(\mathcal{S}_{\text{fin},\nu})=\emptyset$ if and only if $L({\mathcal{A}}_{\mathcal{S}_{\text{fin},\nu}})=\emptyset$ ; and $\left|\mathinner{{\mathrm{Sol}}_{E,k}(\mathcal{S}_{\text{fin},\nu})}\right|<\infty$ if and only if ${\mathcal{A}}_{\mathcal{S}_{\text{fin},\nu}}$ doesn’t contain any directed cycle.

Proof.

The existence of the NFA ${\mathcal{A}}_{\mathcal{S}_{\text{fin},\nu}}$ with the desired properties is a formal consequence of Thm. 4.3. For the complexity issues we need an estimation of $m_{\mathcal{A}_{\mathcal{S}_{\text{fin},\nu}}}(N_{M})$ . It is however clear from the construction that we have

[TABLE]

where $m_{\mathcal{A}_{\mathcal{S}_{\text{fin},\nu}}}(N_{M})$ was defined in (4) and $m(\Phi)$ was defined in (75).

$\mathsf{NSPACE}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2}(\left\|\mathinner{\Phi}\right\|_{\text{rat}}^{2}+\log\left\|\mathinner{\Phi}\right\|_{\text{eq}}))=\mathsf{NSPACE}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2}m(\Phi)\log\left\|\mathinner{\Phi}\right\|_{\text{eq}})$ is due to (76). Thus the complexity follows again by Thm. 4.3. ∎

We did various modifications to the input formula $\Phi$ to arrive at a system $\mathcal{S}_{\text{fin},\nu}$ mentioned Lem. 14.15. Each step on the way from $\Phi$ to $\mathcal{S}_{\text{fin},\nu}$ involved a splitting or guessing, which are realized by transducers respecting the space bound. In order to define the NFA ${\mathcal{A}}_{\Psi_{\text{mon}}}$ which is needed for Prop. 14.14, we put all the pieces together. Thus, Prop. 14.14 is shown.

Corollary 14.16.

Let $G$ be a finitely generated virtually free group given by a short exact sequence as in (58) and let $\varphi\colon G\to\mathbb{F}(E)\rtimes H$ the embedding of $G$ into a semi-direct product as in Fig. 9.

Then there is an $\mathsf{NSPACE}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2}(\left\|\mathinner{\Phi}\right\|_{\text{rat}}^{2}+\log\left\|\mathinner{\Phi}\right\|_{\text{eq}}))$ algorithm which performs the following task. It takes as input a Boolean formula $\Phi$ . The output is an extended alphabet $C$ of size $\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{eq}}^{2})$ with $E\subseteq C$ , letters $d_{i}\in C$ for all $1\leqslant i\leqslant k$ , and a trim NFA ${\mathcal{A}}_{E,\Phi}$ accepting a rational set of $E$ -morphisms over $C^{*}$ . The corresponding EDT0L relation

[TABLE]

satisfies the following properties.

(1)

We have $\mathcal{R}({\mathcal{A}}_{E,\Phi})\subseteq\mathbb{F}^{k}$ . Thus, each for each $h\in L({\mathcal{A}}_{E,\Phi})$ and $1\leqslant i\leqslant k$ the word $h(d_{i})$ is freely reduced. 2. (2)

We have $\varphi(\mathrm{Sol}_{A,k}(\Phi))=\mathcal{R}({\mathcal{A}}_{E,\Phi})$ .

Proof.

As above we can use the same techniques of splitting and guessing based on (69), (71), and (73). Hence it is possible to construct the NFA ${\mathcal{A}}_{E,\Phi}$ by putting exponentially many NFAs of the form ${\mathcal{A}}_{\Psi_{\text{mon}}}$ provided by Prop. 14.14. Again we may use a transducer which satisfies the required space bound since all pieces can be constructed one after another. ∎

14.7. Proof of Thm. 14.2. From the NFA ${\mathcal{A}}_{E,\Phi}$ back to $\Phi$ .

We have $A\subseteq E$ and in Sect. 14.2 we defined an $A$ -morphism $\psi\colon E^{*}\to A^{*}$ by $\psi(t)=1$ for all $t\in E\setminus A$ . Since $\varphi(a)\in T^{*}aT^{*}$ we see $\psi\varphi(a)=a$ for all $a\in A$ . See the commutative diagram in Fig. 11. Therefore, the second statement in Cor. 14.16 yields

[TABLE]

The first statement says that $\mathcal{R}({\mathcal{A}}_{E,\Phi})$ is an EDT0L relation in freely reduced words over $E$ ; and Cor. 14.9 asserts that $\psi$ maps freely reduced words to freely reduced words over $A$ . Using one state more than the NFA ${\mathcal{A}}_{E,\Phi}$ (actually a new initial state) and a transition labeled by $\psi$ from the old initial state to the new one, we obtain the desired NFA $\mathcal{A}_{\Phi}$ . Hence, we can realize $\mathrm{Sol}_{A,k}(\Phi)$ as an effective EDT0L relation in freely reduced words over $A$ . Thus, the projection $\pi_{A}\colon A^{*}\to\mathbb{F}(A)$ yields a bijection between $\mathrm{Sol}_{A,k}(\Phi)$ and the full solution set $\pi_{A}(\mathrm{Sol}_{A,k}(\Phi))\subseteq\mathbb{F}(A)^{k}$ . This concludes the proof of Thm. 14.2.

15. $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$

In this section we apply our results to the perhaps most prominent example of a (non-free) virtually free group: the special linear group $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ of $2\times 2$ matrices over $\mathbb{Z}$ . It is well known that $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ is isomorphic to the amalgamated product $\mathbb{Z}/4\mathbb{Z}\star_{\mathbb{Z}/2\mathbb{Z}}\mathbb{Z}/6\mathbb{Z}$ . Possible generators to establish the isomorphism between $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ and $\mathbb{Z}/6\mathbb{Z}\star_{\mathbb{Z}/2\mathbb{Z}}\mathbb{Z}/4\mathbb{Z}$ are the matrices $\rho=\left(\begin{smallmatrix}0&-1\\ 1&1\end{smallmatrix}\right)$ and $\tau=\left(\begin{smallmatrix}0&1\\ -1&0\end{smallmatrix}\right)$ of orders $6$ and $4$ respectively. We have $\rho^{3}=\tau^{2}=\left(\begin{smallmatrix}-1&0\\ 0&-1\end{smallmatrix}\right)$ . We also denote the matrices $\left(\begin{smallmatrix}-1&0\\ 0&-1\end{smallmatrix}\right)$ and $\left(\begin{smallmatrix}1&0\\ 0&1\end{smallmatrix}\right)$ as $-1$ and $1$ respectively.555Typical proofs for $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})\cong\mathbb{Z}/4\mathbb{Z}\star_{\mathbb{Z}/2\mathbb{Z}}\mathbb{Z}/6\mathbb{Z}$ use a “ping-pong-argument” for a faithful action of the projective linear group $\mathop{\mathrm{missing}}{PSL}(2,\mathbb{Z})=\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})/\left\{\mathinner{\pm 1}\right\}$ on $\mathbb{R}\setminus\mathbb{Q}$ . When working with algebraic problems over $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ , like solving equations, it is more natural that the constants are just matrices (with entries written as binary numbers) rather than words over a finite generating set. Moreover, there is no reason to see a sum or a factor of two matrices $\left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right)$ and $\left(\begin{smallmatrix}a^{\prime}&b^{\prime}\\ c^{\prime}&d^{\prime}\end{smallmatrix}\right)$ because we would add or multiply the matrices together. For a matrix $M=\left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right)$ in $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ we let $\left\|\mathinner{M}\right\|_{1}=\max\left\{\mathinner{|a|+|c|,\,|b|+|d|}\right\}$ ; and we define its binary size

[TABLE]

Note that $\left\|\mathinner{M}\right\|_{1}$ is the usual matrix one-norm of the matrix $\left(\begin{smallmatrix}a&c\\ b&d\end{smallmatrix}\right)$ . We use the notion of binary size to define the size of equations and Boolean formulae where constants are matrices. The only difference is that the size of a constant $M$ in $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ is not $1$ as for a finite generating set, but ${\left\|\mathinner{M}\right\|_{\text{bin}}}$ . We leave it to the reader to define the size of a Boolean formula accordingly. To have a notation for Boolean formulae $\Phi$ as well, we denote the new size by $\left\|\mathinner{\Phi}\right\|_{\text{bin}}$ .

The aim of this section is to prove the following result.

Corollary 15.1.

There exists a generating set $S$ for $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ of $21$ letters, and an $\mathsf{NSPACE}(m(\Phi)\left\|\mathinner{\Phi}\right\|_{\text{bin}}^{2}\log\left\|\mathinner{\Phi}\right\|_{\text{bin}})$ algorithm which performs the following task. It takes as input a Boolean formula $\Phi$ where the constants are matrices over $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ (counted in their binary size) and in variables from $\mathcal{X}=\mathcal{X}_{+}\cup\mathcal{X}_{-}$ such that $X\in\mathcal{X}_{+}\iff\overline{X}\in\mathcal{X}_{-}$ and $\mathcal{X}_{+}=\left\{\mathinner{X_{1},\ldots,X_{k}}\right\}$ , where each variable has size $1$ for simplicity. The output is an extended alphabet $C$ of size $\mathcal{O}(\left\|\mathinner{\Phi}\right\|_{\text{bin}}^{2})$ , letters $d_{i}\in C$ for all $1\leqslant i\leqslant k$ , and a trimmed NFA ${\mathcal{A}}_{\Phi}$ accepting a rational set of $A$ -morphisms over $C^{*}$ such that the EDT0L relation

[TABLE]

is equal to the full solution set in standard normal form as given in (57)

[TABLE]

Moreover, $\mathrm{Sol}(\Phi)=\emptyset$ if and only if $L(\mathcal{A})=\emptyset$ ; and $\left|\mathinner{\mathrm{Sol}(\Phi)}\right|<\infty$ if and only if $\mathcal{A}$ doesn’t contain any directed cycle.

The proof of Cor. 15.1 covers the rest of the section. In a first part, we make the reduction to the framework of Thm. 14.2 fully explicit. The main message is that a few elementary facts are enough to apply Thm. 14.2 to $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ without any reference to Bass-Serre theory [58] with the resulting black box Prop. 14.7. In fact, what we use about $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ predates the invention of Bass-Serre theory. For that we reformulate Thm. 14.2 for $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ in Cor. 15.1. The point is that we view Cor. 15.1 directly as a corollary to Thm. 4.3.

In the second part we show that working with matrices doesn’t increase the complexity. To see the difference, let $w\in\left\{\mathinner{\rho,\rho^{-1},\tau,\tau^{-1}}\right\}^{*}$ be a word in the (symmetric) set of natural generators with $n=|w|_{\rho^{\pm 1}}=|w|_{\rho}+|w|_{\rho^{-1}}$ , and let ${\left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right)}$ denote its image in $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ , then a straightforward calculation shows $\left\|\mathinner{\left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right)}\right\|_{1}\leqslant F_{n+2}$ , where $F_{n+2}$ is the $(n+2)$ nd Fibonacci number. In particular,

[TABLE]

This means that working with matrices and their binary size doesn’t increase the input size with respect to the reduced word lengths over $\left\{\mathinner{\rho,\rho^{-1},\tau,\tau^{-1}}\right\}^{*}$ . However, an exponential gap between $\left\|\mathinner{\left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right)}\right\|_{\text{bin}}$ and $|w|_{\rho^{\pm 1}}$ is possible. For example, we have $(\tau\rho)^{n}=\left(\begin{smallmatrix}1&n\\ 0&1\end{smallmatrix}\right)$ and $\left\|\mathinner{\left(\begin{smallmatrix}1&n\\ 0&1\end{smallmatrix}\right)}\right\|_{\text{bin}}=\log(n+1).$ It is easy to see that $(\tau\rho)^{n}$ is the shortest word in $\left\{\mathinner{\rho,\rho^{-1},\tau,\tau^{-1}}\right\}^{*}$ which represents the matrix $\left(\begin{smallmatrix}1&n\\ 0&1\end{smallmatrix}\right)$ . Thus, the matrix representation of (shortest) words can lead to an exponential compression. However, in [21] Gurevich and Schupp give an exponential representation of a matrix $M$ in $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ by words over $\left\{\mathinner{\rho,\tau}\right\}^{*}$ where the bit complexity of the exponential representation is linear in $\left\|\mathinner{M}\right\|_{\text{bin}}$ . (In an exponential representation exponents over factors are written in binary.) In order to prove the lemma of Gurevich and Schupp we use the matrices $L=\left(\begin{smallmatrix}1&0\\ 1&1\end{smallmatrix}\right)=\tau\rho^{2}$ and $U=\left(\begin{smallmatrix}1&1\\ 0&1\end{smallmatrix}\right)=\tau\rho$ in Sect. 15.2.

15.1. Explicit embedding of $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ into a semi-direct product

Throughout, we use $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})\cong G$ and we work with $G=\mathbb{Z}/6\mathbb{Z}\star_{\mathbb{Z}/2\mathbb{Z}}\mathbb{Z}/4\mathbb{Z}$ and its quotient $\mathop{\mathrm{missing}}{PSL}(2,\mathbb{Z})\cong G^{\prime}=\mathbb{Z}/3\mathbb{Z}\star\mathbb{Z}/2\mathbb{Z}$ . The group $G^{\prime}$ is the modular group666Note that $G^{\prime}$ is not the derived subgroup $[G,G]$ . which is frequently denoted as $\Gamma$ in the literature. There are natural actions of $G$ and $G^{\prime}$ (and hence of $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ and $\mathop{\mathrm{missing}}{PSL}(2,\mathbb{Z})$ ) on the complete bipartite graph $K_{3,2}$ . The actions are defined below and the graph $K_{3,2}$ is depicted in Fig. 12. We give an orientation to the set of undirected edges in $K_{3,2}$ according to that picture777Actually, $K_{3,2}$ is the quotient graph of the Bass-Serre tree for $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ modulo the action by that group. Fig. 12 as well as some subsequent calculations appear in [13], too.. Denoting the set of directed edges $\left\{\mathinner{a,\ldots,f}\right\}$ , we obtain an alphabet $E=\left\{\mathinner{a,\overline{a},\ldots,f,\overline{f}}\right\}$ . As usual, an undirected edge is two-element set $\left\{\mathinner{y,\overline{y}}\right\}$ .)

Let us write $G=\left<\mathinner{\rho}\right>\star_{\mathbb{Z}/2\mathbb{Z}}\left<\mathinner{\tau}\right>$ and $G^{\prime}=\left<\mathinner{r}\right>\star\left<\mathinner{t}\right>$ where $\rho^{6}=r^{3}=1$ and $\tau^{4}=t^{2}=1$ . The action of $G$ on the bipartite graph $K_{3,2}$ is as follows. The generator $\tau$ (resp. $t$ ) stabilizes the vertices $P_{\alpha}$ for $\alpha\in\left\{\mathinner{1,\rho,\rho^{2}}\right\}$ , and we let $\tau{R}_{1}=t{R}_{1}={R}_{\tau}$ with ${R}_{\tau^{2}}=R_{1}$ . Thus, $\tau$ and $t$ are transpositions. The elements $\rho\in G$ and $r\in G^{\prime}$ are rotations. Both stabilize the vertices $R_{1}$ and $R_{\tau}$ , and we let $\rho{P}_{\alpha}=r{P}_{\alpha}={P}_{\rho\alpha}$ (with ${P}_{\rho^{3}}=P_{1}$ ). The action of $G^{\prime}$ is faithful, and its image in the $\operatorname{Aut}(K_{3,2})$ is the direct product $\mathbb{Z}/3\mathbb{Z}\times\mathbb{Z}/2\mathbb{Z}$ . Thus, $H^{\prime}=\mathbb{Z}/6\mathbb{Z}$ acts on $K_{3,2}$ by identifying the action of $2\in\mathbb{Z}/6\mathbb{Z}$ with the action of $r$ and by identifying the action of $3\in\mathbb{Z}/6\mathbb{Z}$ with the action of $t$ .

This leads to surjective homomorphisms $\gamma\colon G\to\mathbb{Z}/12\mathbb{Z}$ and $\gamma^{\prime}\colon G^{\prime}\to\mathbb{Z}/6\mathbb{Z}$ by $\gamma(\rho)=\gamma^{\prime}(r)=2$ and $\gamma({\tau})=\gamma^{\prime}(t)=3$ . Note that $\gamma$ is a homomorphism since $\gamma(\rho^{3})=6=\gamma(\tau^{2})$ . Moreover, $\gamma$ induces an isomorphism between the kernel of the canonical projection of $G$ to $G^{\prime}$ (which is the center of $G$ generated by $\tau^{2}$ ) and the the kernel of the canonical projection of $\mathbb{Z}/12\mathbb{Z}$ to $\mathbb{Z}/6\mathbb{Z}$ . Thus, the mapping $\rho\mapsto r$ and $\tau\mapsto t$ induces a canonical isomomorphism between the kernels $\ker(\gamma)$ and $\ker(\gamma^{\prime})$ by Fig. 13.

The following proposition is well-known. It is stated in [41, Lem. 1] (without proof) since, according to the author Morris Newman, Prop. 15.2 it is based on a more general result by Jacob Nielsen [42]. For the proof of Prop. 15.2 one might use the structure of $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ as an amalgamated product as well as the well-known fact that the matrices $A^{\prime}=\left(\begin{smallmatrix}2&1\\ 1&1\end{smallmatrix}\right)$ and $B^{\prime}=\left(\begin{smallmatrix}1&1\\ 1&2\end{smallmatrix}\right)$ generate a free subgroup in $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ . However, in the spirit of the paper, let us give a purely combinatorial proof of Prop. 15.2 using the structure of $G^{\prime}$ as a free product of $\mathbb{Z}/3\mathbb{Z}$ and $\mathbb{Z}/2\mathbb{Z}$ .

Proposition 15.2.

The kernel of $\gamma^{\prime}$ is the commutator subgroup $[G^{\prime},G^{\prime}]$ of $G^{\prime}=\mathop{\mathrm{missing}}{PSL}(2,\mathbb{Z})$ is a free group of rank two, which is generated by the commutators $[t,r]$ and $[t,r^{2}]$ .

Proof.

Clearly, $[G^{\prime},G^{\prime}]\leqslant\ker(\gamma^{\prime})$ . We have $[t,r]=trt^{-1}r^{-1}=trtr^{2}$ and $[t,r^{2}]=tr^{2}tr$ . Let us show first that $\left<\mathinner{[t,r],\,[t,r^{2}]}\right>=\left<\mathinner{trtr^{2},\,tr^{2}tr}\right>$ is a normal subgroup in $G^{\prime}$ . Indeed:

(1)

$t[t,r]t^{-1}=t(trtr^{2})t^{-1}=rtr^{2}t=(trtr^{2})^{-1}=[r,t]$ . 2. (2)

$r^{-1}[t,r]r=r^{2}(trtr^{2})r=r^{2}trt=[r^{2},t]=[t,r^{2}]^{-1}$ . 3. (3)

$r[t,r]r^{-1}=r(trtr^{2})r^{2}=rtrtr=rtr(rttr^{2})tr=(rtr^{2}t)(tr^{2}tr)=[r,t][t,r^{2}]$ .

This implies that $\left<\mathinner{[t,r],\,[t,r^{2}]}\right>$ is the normal subgroup generated by the commutator $[t,r]$ . By definition, $G^{\prime}/[G^{\prime},G^{\prime}]=G^{\prime}/\left\{\mathinner{[t,r]=1}\right\}$ . Hence, $\left<\mathinner{[t,r],\,[t,r^{2}]}\right>=[G^{\prime},G^{\prime}]$ is generated by two elements.

Second, we claim $[G^{\prime},G^{\prime}]=\ker(\gamma^{\prime})$ . We know

[TABLE]

Since $\gamma^{\prime}$ induces a surjective homomorphism $G^{\prime}/[G^{\prime},G^{\prime}]\to\mathbb{Z}/6\mathbb{Z}$ , this induced homomorphism is an isomomorphism.888More generally, if $K=C_{1}\star C_{2}$ is a free product of two cyclic groups, then $[K,K]$ is equal to the kernel of the canonical projection of $K$ to the direct product $C_{1}\times C_{2}$ . Hence, the claim.

The commutator subgroup is henceforth denoted by $F^{\prime}$ . We wish to show that $F^{\prime}$ is free. For that step let $r,s,t$ be three letters such that $s=r^{-1}$ in $G^{\prime}$ and $\Sigma=\left\{\mathinner{A,\overline{A},B,\overline{B}}\right\}$ be four letters. Using $t^{2}=1$ we obtain a homomorphism $\psi^{\prime}:\left\{\mathinner{r,s,t}\right\}^{*}\to G^{\prime}$ . Let us define the letters $A$ and $B$ as the words $A=trts$ and $B=tstr$ in $\left\{\mathinner{r,s,t}\right\}^{+}$ . The aim is to show the restriction of $\psi^{\prime}$ to $\left\{\mathinner{A,B}\right\}^{*}$ defines an isomomorphism $\psi:F(A,B)\to F^{\prime}$ . Since $\psi^{\prime}(A)$ and $\psi^{\prime}(B)$ generate $F^{\prime}=[G^{\prime},G^{\prime}]$ , we content ourselves to show that a nonempty freely reduced word in $w\in\Sigma^{+}$ is not mapped to $1\in G^{\prime}$ . In $G^{\prime}$ we use the reduced normal form which is obtained by a confluent and length-reducing rewriting system which replaces $rs$ , $sr$ , and $t^{2}$ by $1$ and which replaces $r^{2}$ by $s$ and $s^{2}$ by $r$ .

We claim that we can detect the last letter of $w\in\Sigma^{+}$ by knowing the last four letters in the reduced normal form of $\psi^{\prime}(w)$ . This is clear for $\left|\mathinner{w}\right|=1$ . Hence, we may assume that $\left|\mathinner{w}\right|=k$ with $k\geqslant 2$ and that the claim is correct for words of length at most $k-1$ . For that we define a finite automaton with eight states and transitions which are labeled by the letters $A=trts$ , $\overline{A}=rtst$ , $B=tstr$ , and $\overline{B}=strt$ . Those transitions where the label differs from the target are given by the following list:

(1)

$rtst\overset{tstr}{\longrightarrow}trtr$ 2. (2)

$strt\overset{trts}{\longrightarrow}tsts$ 3. (3)

$trts\overset{strt}{\longrightarrow}rtrt$ 4. (4)

$tstr\overset{rtst}{\longrightarrow}stst$ 5. (5)

$trtr\overset{rtst}{\longrightarrow}stst$ 6. (6)

$tsts\overset{strt}{\longrightarrow}rtrt$ 7. (7)

$rtrt\overset{trts}{\longrightarrow}tsts$ 8. (8)

$stst\overset{tstr}{\longrightarrow}trtr$

We don’t define initial or final states. Since we intend to read only freely reduced words, each state has out degree $3$ . Hence, there are $24$ transitions but only $8$ of them are listed in the table above. The automaton is deterministic. Consider a non-empty freely reduced word $w=c_{1}\cdots c_{k}$ with $c_{i}\in\Sigma$ and $k\geqslant 1$ . After reading $c_{1}$ we are in the corresponding state. Hence, the claim is correct for $k=1$ . For example, if $c_{1}=\overline{B}$ , then we are in the state $strt$ which is the left-hand side in line (2). For $k\geqslant 2$ we can assume by induction that the state of the left-hand side in the corresponding represents the last four letters in the reduced normal form of $\psi(c_{1}\cdots c_{k-1})$ . For example, assume the state is the left-hand side in line (6) which is the right-hand side in line (7). This implies $c_{k-1}=A$ . Therefore $c_{k}\in\left\{\mathinner{A,B,\overline{B}}\right\}$ . If $c_{k}=A$ , then reading $w$ leads to the state $trts$ . If $c_{k}=B$ , then reading $w$ leads to the state $B=tstr$ . If $c_{k}=\overline{B}=strt$ , then reading $w$ leads to the state $rtrt$ . Thanks to symmetries, the same type of argument applies in all situations. Since after reading the word $w$ we are in a state where the reduced normal form has length $4$ , it is not $1$ in $G^{\prime}$ . Hence, the proposition. ∎

Corollary 15.3.

The kernel of $\gamma$ is the commutator subgroup $[G,G]$ of $G=\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ is a free group of rank two, generated by the commutators $[\tau,\rho]$ and $[\tau,\rho^{2}]$ .

Proof.

We know that the mapping $\rho$ to $r$ and $\tau$ to $t$ defines a homomorphism from $G$ to $G^{\prime}$ . This homomorphism maps $[\tau,\rho]$ to $[t,r]$ and $[\tau,\rho^{2}]$ to $[t,r^{2}]$ which yields an isomomorphism between $[G,G]$ and $[G^{\prime},G^{\prime}]$ , see Fig. 13. Hence, the statement about the commutator subgroup for $G$ follows from Prop. 15.2. ∎

The action of $H^{\prime}=\mathbb{Z}/6\mathbb{Z}$ on $K_{3,2}$ induces a faithful action on the set of directed edges $E=\left\{\mathinner{a,\overline{a},\ldots,f,\overline{f}}\right\}$ which respects the involution. For example: $\tau(c)=\overline{d}$ , $\tau\rho(a)=d$ , and $\rho(c)=f$ etc. Therefore, the canonical homomorphism $H\to H^{\prime}=\mathbb{Z}/6\mathbb{Z}\leqslant\operatorname{Aut}(E)$ yields a semi-direct product $\mathbb{F}(E)\rtimes H$ . The action of $H$ on $E$ is not faithful: for all $y\in E$ and $m\in\mathbb{Z}$ we have $(\tau\rho)^{m}(y)=y\iff m\in 6\mathbb{Z}$ . In the next step let us show that $[G,G]$ is the fundamental group of the graph $K=K_{3,2}$ . Simultaneously, we will derive the desired result that $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ embeds into the semi-direct product $\mathbb{F}(E)\rtimes H$ where $\mathbb{F}(E)=E^{*}/\left\{y\overline{y}=1\mathrel{\left|\vphantom{y\overline{y}=1}\vphantom{y\in E}\right.}y\in E\right\}$ and $H=\mathbb{Z}/12\mathbb{Z}$ . We choose a spanning tree of $K$ by the solid edges in $K$ according to Fig. 12, and we let $\star=P_{1}$ be a base point in $K$ . Then the fundamental group $\pi_{1}(K,\star)$ (which is, by definition, a subgroup in $\mathbb{F}(E)=F(a,b,c,d,e,f)$ ) can be identified with the free group $F(c,f)$ of rank $2$ . The identification is due to the fact that $c,f$ are the chords for the chosen (directed) spanning tree $T=\left\{\mathinner{a,b,d,e}\right\}$ . Indeed, the isomorphism $\varphi_{1}\colon F(c,f)\to\pi_{1}(K,\star)$ is given by

[TABLE]

To see this, say for $c$ , just follow the shortest path in $T$ from $\star$ to the source of $c$ , traverse the chord $c$ and choose the shortest path in $T$ back to $\star$ . Consider the canonical projection $\mathrm{pr}_{c,f}:\mathbb{F}(E)\to F(c,f)$ which maps the edges of $T$ to $1$ . Then $\mathrm{pr}_{c,f}\varphi_{1}$ is the identity on $F(c,f)\leqslant\mathbb{F}(E)$ .

Define $\psi:\mathbb{F}(c,f)\to[G,G]$ by $\psi(c)=[\tau,\rho]=\tau\rho\tau\rho^{2}$ and $\psi(f)=[\tau,\rho^{2}]=\tau\rho^{2}\tau\rho$ . It is an isomomorphism by Cor. 15.3.

Finally, guided by $\tau(P_{1})=P_{1}$ and $\rho(P_{1})=P_{\rho}$ we define a homomorphism $\varphi\colon G\to\mathbb{F}(E)\rtimes H$ where

[TABLE]

The homomorphism $\varphi$ is well-defined since

[TABLE]

Another direct calculation shows

[TABLE]

Thus, the identity $\text{id}_{F(c,f)}$ factorizes as follows:

[TABLE]

As a consequence, we obtain a commutative diagram Fig. 14 where we let $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})=G$ . Since $\psi$ is bijective, $\varphi\colon\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})\to\mathbb{F}(E)\rtimes H$ is injective. Hence, $\varphi$ induces an isomomorphism between $[G,G]$ and the subgroup $\pi_{1}(K,\star)\leqslant\mathbb{F}(E)=\mathbb{F}(E)\times\left\{\mathinner{1}\right\}\leqslant\mathbb{F}(E)\rtimes H$ .

Since $G$ is a finitely generated subgroup, $\varphi(G)$ is a rational subset in $\mathbb{F}(E)\rtimes H$ . Hence, we can reduce the question about solving equations in $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ to twisted word equations over $\mathbb{F}(E)$ with rational constraints.999This is the approach of [7] to solve word equations in virtually free groups, too. However, Thm. 4.3 is more ambitious. In order to apply Thm. 4.3 we need in particular, an explicit construction of a set of standard generators. We obtain such a set $S$ by defining $S=A_{+}\cup A_{-}\cup H_{+}\cup H_{-}$ where $A_{+}=\left\{\mathinner{c,f}\right\}=\left\{\mathinner{\tau{\rho}\tau{\rho}^{2},\,\tau{\rho}^{2}\tau{\rho}}\right\}$ and $H_{+}=\left\{\mathinner{\rho^{1},\ldots,\rho^{5},\,\tau,\rho^{1}\tau,\ldots,\rho^{5}\tau}\right\}$ . We have $H_{+}\cap H_{-}=\left\{\mathinner{\rho^{1},\ldots,\rho^{5}}\right\}$ and $\rho^{3}$ becomes a self-involuting letter in $S$ .

Remark 15.4.

Let $h=\rho\tau$ , $h^{\prime}=\rho$ , and $g=\rho^{2}\tau$ be letters in $H_{+}$ . Then the element $hh^{\prime}\in S^{*}$ has length $2$ . The corresponding element in standard normal form is $\overline{c}fg\in A^{*}H_{+}$ which has length $3$ . This yields a concrete example showing that the standard normal forms are not geodesic, in general.

15.2. Euclidean matrix calculation

For the proof of Cor. 15.1 it remains to show that the complexity is not worse than $\mathsf{NSPACE}(\left\|\mathinner{\Phi}\right\|_{\text{bin}}^{2}m(\Phi)\log\left\|\mathinner{\Phi}\right\|_{\text{bin}})$ . This is done next. We have ${L}^{-1}=\rho\tau=\left(\begin{smallmatrix}1&0\\ -1&1\end{smallmatrix}\right)$ and hence, ${U}^{-1}L=\rho$ . Since $\rho$ , $\tau$ generate $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ as a monoid, we see that $L,U$ generate $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ as a group. It is therefore clear that every matrix in $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ can be written as a word in $\left\{\mathinner{L,{L}^{-1},U,{U}^{-1}}\right\}^{*}$ , but of course the representation is not unique as for example $({U}^{-1}L)^{6}=1$ .

Let $a_{0},a_{1}\in\mathbb{N}$ with $a_{0}>a_{1}>0$ . Using the extended Euclidean algorithm for computing the $\gcd(a_{0},a_{1})$ we define natural numbers $k_{i}$ for $0\leqslant i<g$ and $a_{i}$ $0\leqslant i\leqslant g+1$ with

[TABLE]

such that for $i\geqslant 0$ we have

[TABLE]

The sequence finishes with some $1\leqslant g\in\mathcal{O}(\log\left|\mathinner{a_{0}}\right|)$ such that $k_{g-1}a_{g}=a_{g-1}$ and $a_{g}=\gcd(a_{0},a_{1})$ . The last value is therefore indeed $a_{g+1}=0$ . We say that $(k_{0},\ldots,k_{g-1})$ is the $\gcd$ -sequence defined by $a_{0},a_{1}$ . Note that $(k_{0},\ldots,k_{g-1})$ together with $a_{g}$ uniquely define $(a_{0},\ldots,a_{g})$ . Note also that $k_{i}\geqslant 1$ for $0\leqslant i<g-1$ and $k_{g-1}=a_{g-1}\geqslant 2$ .

By $\mathop{\mathrm{missing}}{SL}(2,\mathbb{N})$ we mean the following submonoid of $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ :

[TABLE]

It is a well-known classical fact and not difficult to see that $\mathop{\mathrm{missing}}{SL}(2,\mathbb{N})$ is a free monoid with unique basis $\left\{\mathinner{U,L}\right\}$ , see for example [12, Chap. 8.12] or [28] for an application to fast randomized pattern matching. The following quantitative lemma belongs probably to folklore. It can be easily derived from [21], but for lack of a reference for the precise statement we give a proof.

Lemma 15.5.

Let $M=\left(\begin{smallmatrix}a_{0}&a_{1}\\ c_{0}&c_{1}\end{smallmatrix}\right)\in\mathop{\mathrm{missing}}{SL}(2,\mathbb{N})$ with $a_{0}>a_{1}>0$ and let $(k_{0},\ldots,k_{g-1})$ be the $\gcd$ -sequence defined by $a_{0},a_{1}$ . Then there is a (unique) $c_{g}\in\mathbb{N}$ such that following assertions hold.

(1)

$0<k_{0}\cdots k_{g-1}\cdot\min\left\{\mathinner{1,c_{g}}\right\}<a_{0}+c_{0}=\left\|\mathinner{M}\right\|_{1}$ . 2. (2)

If $g$ is even, then

[TABLE] 3. (3)

If $g$ is odd, then $c_{g}>0$ and

[TABLE]

Proof.

For the following we don’t need the uniqueness of $c_{g}$ . It follows from the fact that $\left\{\mathinner{L,U}\right\}$ forms a basis for the free monoid $\mathop{\mathrm{missing}}{SL}(2,\mathbb{N})$ , which in turn follows easily from the present proof. We leave this part to the interested reader.

Consider a matrix $M_{1}=M=\left(\begin{smallmatrix}a_{0}&a_{1}\\ c_{0}&c_{1}\end{smallmatrix}\right)\in\mathop{\mathrm{missing}}{SL}(2,\mathbb{N})$ with $a_{0}>a_{1}>0$ . Note that this implies $\gcd(a_{0},a_{1})=1$ . Moreover, $c_{0}\geqslant c_{1}>0$ because $a_{0}c_{1}=a_{1}c_{0}+1$ . (The case $c_{0}=c_{1}$ is possible only for $M=\left(\begin{smallmatrix}a_{0}&a_{0}-1\\ 1&1\end{smallmatrix}\right)$ .) Let us treat the case $a_{1}=1$ as a special case first. That is: $M=\left(\begin{smallmatrix}k_{0}&1\\ c_{0}&c_{1}\end{smallmatrix}\right)$ . We obtain $k_{0}=a_{0}$ and

[TABLE]

Moreover, $c_{0}=k_{0}c_{1}-1$ . Since $a_{0}\geqslant 2$ we have $1\leqslant c_{0}<k_{0}c_{1}<a_{0}+c_{0}$ .

For the rest of the proof we may assume $g\geqslant 2$ . We let $(k_{0},\ldots,k_{g-1})$ (and $(a_{0},\ldots,a_{g-1},1)$ ) be the $\gcd$ -sequences defined by $a_{0},a_{1}$ . Next, we define matrices $M_{i}$ for $1\leqslant i\leqslant g$ according to the following rules.

(1)

If $1\leqslant i<g$ and $i$ is odd and $M_{i}=\left(\begin{smallmatrix}a_{i-1}&a_{i}\\ c_{i-1}&c_{i}\end{smallmatrix}\right)$ is defined, then we let

[TABLE] 2. (2)

If $1\leqslant i<g$ and $i$ is even and $M_{i}=\left(\begin{smallmatrix}a_{i}&a_{i-1}\\ c_{i}&c_{i-1}\end{smallmatrix}\right)$ is defined, then we let

[TABLE]

It follows by induction that $M_{i}\in\mathop{\mathrm{missing}}{SL}(2,\mathbb{N})$ for all $1\leqslant i\leqslant g$ . Having this we can deduce, again by induction, for all $1\leqslant i\leqslant g$ :

[TABLE]

The situation for the $c_{i}$ is slightly different. For for all $1\leqslant i\leqslant g-1$

[TABLE]

To see (87) we observe that $1\leqslant c_{g-1}=k_{g-1}c_{g}\pm 1$ . Hence, we can use (86) to conclude (87). Considering $i=g-1$ shows the first claim in the lemma, because (87) implies $k_{0}\cdots k_{g-1}c_{g}\leqslant c_{0}+k_{0}\cdots k_{g-2}$ and $k_{0}\cdots k_{g-2}<k_{0}\cdots k_{g-1}\leqslant a_{0}$ by $k_{g-1}\geqslant 2$ and (84).

For the last matrix is $M_{g}$ and depending on whether $g$ is odd or even, we have two options. If $g$ is even we let $D=L$ and $D=U$ otherwise. We obtain:

[TABLE]

First case. Let $g$ be even, hence $M_{g}=\left(\begin{smallmatrix}1&k_{g-1}\\ c_{g}&c_{g-1}\end{smallmatrix}\right)$ . Then we have

[TABLE]

It is possible that $c_{g}=0$ in the line above.

Second case. Let $g$ be odd, hence $M_{g}=\left(\begin{smallmatrix}k_{g-1}&1\\ c_{g-1}&c_{g}\end{smallmatrix}\right)$ . Then we have

[TABLE]

Note that for $g$ odd, we have $c_{g}\geqslant c_{g}>0$ and $k_{g-1}=a_{g-1}>a_{g}=1$ . Using (88) and a case distinction (whether or not $g$ is even) yields the result. ∎

Proposition 15.6 (Gurevich and Schupp [21]).

Let $M=\left(\begin{smallmatrix}a&b\\ c&d\end{smallmatrix}\right)\in\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ and $m=\max\left\{\mathinner{\left|\mathinner{a}\right|,\left|\mathinner{b}\right|,\left|\mathinner{c}\right|,\left|\mathinner{d}\right|}\right\}$ . Then there are words $u,v\in\left\{\mathinner{\rho,\tau}\right\}^{*}$ and positive integers $e_{0},\ldots,e_{\ell}$ with $0\leqslant{\ell}\in\mathcal{O}(\log m)$ such that

[TABLE]

Proof.

As a preamble let us note that we will be able to enforce $e_{0}\neq 0\neq e_{\ell}$ because ${L}^{-1}={\rho\tau}$ is a short word over $\rho$ and $\tau$ .

The assertion is trivial for $m=1$ . Hence we assume $m\geqslant 2$ . Using short words $u^{\prime},v^{\prime}\in\left\{\mathinner{\rho,\tau}\right\}^{*}$ , we obtain a matrix

[TABLE]

with $m={a_{0}}>{a_{1}}>0$ and ${c_{0}}>{c_{1}}>0$ . Since $m\geqslant 2$ it is enough to see that we can choose $u^{\prime}=\tau^{2+e_{0}}U^{e_{1}}\tau^{e_{2}}$ and $v^{\prime}=\tau^{e_{3}}U^{e_{4}}$ where the exponents $e_{j}$ are in $\left\{\mathinner{0,1}\right\}$ . We have $M^{\prime}\in\mathop{\mathrm{missing}}{SL}(2,\mathbb{N})$ and therefore the result follows from Lem. 15.5. ∎

Proof of Cor. 15.1

Prop. 15.6 shows that the size of the exponential expression

[TABLE]

is linear in $\left\|\mathinner{M}\right\|_{\text{bin}}$ . Thus, we can apply Cor. 14.16 based on the explicit embedding of $\mathop{\mathrm{missing}}{SL}(2,\mathbb{Z})$ into the semi-direct product as depicted in Fig. 14.

Acknowledgments

The authors are indebted to anonymous reviewers for their careful reading and extremely helpful feedback on the initial submission of this manuscript. We also thank Armin Weiß for various suggestions and Igor Potapov for pointing out the paper [21] of Gurevich and Schupp.

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. R. Asveld. Controlled iteration grammars and full hyper-AFL’s. Information and Control , 34(3):248 – 269, 1977.
2[2] M. Benois. Parties rationelles du groupe libre. C. R. Acad. Sci. Paris, Sér. A , 269:1188–1190, 1969.
3[3] V. Berthé, C. D. Felice, V. Delecroix, F. Dolce, J. Leroy, D. Perrin, C. Reutenauer, and G. Rindone. Specular sets. Theoretical Computer Science , 684:3–28, 2017.
4[4] R. V. Book, T. J. Long, and A. L. Selman. Quantitative relativizations of complexity classes. SIAM J. Comput. , 13:461–487, 1984.
5[5] L. Ciobanu, V. Diekert, and M. Elder. Solution sets for equations over free groups are EDT 0L languages. International Journal of Algebra and Computation , 26:843–886, 2016. Conference abstract in ICALP 2015, LNCS 9135 with full version on Ar Xiv e-prints: abs/1502.03426.
6[6] L. Ciobanu and M. Elder. Solutions sets to systems of equations in hyperbolic groups are EDT 0L in PSPACE. In C. Baier, I. Chatzigiannakis, P. Flocchini, and S. Leonardi, editors, 46th International Colloquium on Automata, Languages, and Programming (ICALP 2019) , volume 132 of Leibniz International Proceedings in Informatics (LIP Ics) , pages 110:1–110:15, Dagstuhl, Germany, 2019. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik.
7[7] F. Dahmani and V. Guirardel. Foliations for solving equations in groups: free, virtually free and hyperbolic groups. J. of Topology , 3:343–404, 2010.
8[8] V. Diekert. Makanin’s Algorithm. In M. Lothaire, editor, Algebraic Combinatorics on Words , volume 90 of Encyclopedia of Mathematics and Its Applications , chapter 12, pages 387–442. Cambridge University Press, 2002.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Solutions to twisted word equations and equations in virtually free groups

Abstract.

Introduction

1. Organization of the paper

1.1. The overall structure

1.2. Technical details

2. Preliminaries

2.1. Complexity

2.2. Sets and monoids with involution

2.3. Group actions and HHH-monoids

2.4. Free monoids with involution and an HHH-action

2.5. Automata, rational and recognizable subsets in a monoid

2.6. From NFAs to Boolean matrices

Example 2.1**.**

3. Regular languages in presence of an involution and an HHH-action

Example 3.1** ([10]).**

Proposition 3.2**.**

3.1. Stabilizers

Lemma 3.3**.**

Proof.

Lemma 3.4**.**

Proof.

Remark 3.5**.**

3.2. HHH-NNN-monoids

3.3. Free HHH-NNN-monoids and types

Lemma 3.6**.**

Proof.

3.4. EDT0L languages and relations

4. Twisted word equations

4.1. The initial setting

Definition 4.1**.**

Example 4.2**.**

4.2. The main result on twisted word equations

Theorem 4.3**.**

Corollary 4.4**.**

Proof.

5. Preparation

5.1. Reducing to faithful actions

Lemma 5.1**.**

5.2. Making the finite monoid NNN larger

5.3. Introducing a zero to NNN and a marker symbol to AAA.

5.4. Triangular systems

Definition 5.2**.**

5.5. Fixing more notation

Remark 5.3**.**

5.6. The initial word equation WinitW_{\mathrm{init}}Winit​

5.7. Fixing the parameters nnn, ε\varepsilonε, and δ\deltaδ

5.8. Extended equations and their solutions

Definition 5.4**.**

Definition 5.5**.**

6. Twisted conjugacy and δ\deltaδ-periodic words

Proposition 6.1**.**

Proof.

Corollary 6.2**.**

Proof.

Definition 6.3**.**

Lemma 6.4**.**

Proof.

7. The ambient infinite automaton T\mathcal{T}T

7.1. States

7.1.1. Initial state.

7.1.2. Final states.

7.2. Transitions

7.2.1. Substitution transitions

Definition 7.1**.**

Lemma 7.2**.**

Proof.

7.2.2. Compression transitions

Definition 7.3**.**

Lemma 7.4**.**

Proof.

Proposition 7.5**.**

Proof.

8. The intermediate automaton F\mathcal{F}F

2.3. Group actions and $H$ -monoids

2.4. Free monoids with involution and an $H$ -action

Example 2.1.

3. Regular languages in presence of an involution and an $H$ -action

Example 3.1 ([10]).

Proposition 3.2.

Lemma 3.3.

Lemma 3.4.

Remark 3.5.

3.2. $H$ - $N$ -monoids

3.3. Free $H$ - $N$ -monoids and types

Lemma 3.6.

Definition 4.1.

Example 4.2.

Theorem 4.3.

Corollary 4.4.

Lemma 5.1.

5.2. Making the finite monoid $N$ larger

5.3. Introducing a zero to $N$ and a marker symbol to $A$ .

Definition 5.2.

Remark 5.3.

5.6. The initial word equation $W_{\mathrm{init}}$

5.7. Fixing the parameters $n$ , $\varepsilon$ , and $\delta$

Definition 5.4.

Definition 5.5.

6. Twisted conjugacy and $\delta$ -periodic words

Proposition 6.1.

Corollary 6.2.

Definition 6.3.

Lemma 6.4.

7. The ambient infinite automaton $\mathcal{T}$

Definition 7.1.

Lemma 7.2.

Definition 7.3.

Lemma 7.4.

Proposition 7.5.

8. The intermediate automaton $\mathcal{F}$

Definition 8.1.

Remark 8.2.

Proposition 8.3.

Remark 9.1.

Lemma 10.1.

10.3. Mapping the positions from $\sigma(W)$ to $W$

Definition 10.2.

10.5. $\delta$ -periodic compression

Proposition 10.3.

10.6. The end of the $\delta$ -periodic compression

Definition 11.1.

Lemma 11.2.

Lemma 11.3.

Lemma 11.4.

Example 11.5.

Lemma 11.6.

Definition 11.7.

Remark 11.8.

Remark 11.9.

Proposition 11.10.

Remark 11.11.

Corollary 12.1.

12.1. The $\mathsf{NSPACE}$ algorithm to compute the trim NFA $\mathcal{A}_{\mathcal{S}}$ .

13.4. Removing exponential expressions in $\Phi$

Proposition 13.1.

Definition 14.1.

Theorem 14.2.

Remark 14.3.

Remark 14.4.

Lemma 14.5.

Remark 14.6.

Proposition 14.7 ([13], Sec. 2.4.5).

Remark 14.8.