Languages ordered by the subword order
Dietrich Kuske, Georg Zetzsche

TL;DR
This paper investigates the decidability of logical theories over languages with subword and cover relations, extending previous results by including regular predicates and counting quantifiers.
Contribution
It introduces four new decidable logical theories for languages with subword, cover, and regular predicates, expanding the scope of earlier decidability results.
Findings
Four new decidable logical theories identified
Decidability depends on language, predicates, and logic fragment
Extends previous work by including cover relation and counting quantifiers
Abstract
We consider a language together with the subword relation, the cover relation, and regular predicates. For such structures, we consider the extension of first-order logic by threshold- and modulo-counting quantifiers. Depending on the language, the used predicates, and the fragment of the logic, we determine four new combinations that yield decidable theories. These results extend earlier ones where only the language of all words without the cover relation and fragments of first-order logic were considered.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Logic, programming, and type systems · Advanced Algebra and Logic
\hideLIPIcs
TU Ilmenau, [email protected], [email protected] \CopyrightD. Kuske and G. Zetzsche\supplement\funding
Acknowledgements.
\EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23
Languages ordered by the subword order
Dietrich Kuske
and
Georg Zetzsche
Abstract.
We consider a language together with the subword relation, the cover relation, and regular predicates. For such structures, we consider the extension of first-order logic by threshold- and modulo-counting quantifiers. Depending on the language, the used predicates, and the fragment of the logic, we determine four new combinations that yield decidable theories. These results extend earlier ones where only the language of all words without the cover relation and fragments of first-order logic were considered.
Key words and phrases:
Subword order, formal languages, first-order logic, counting quantifiers
1991 Mathematics Subject Classification:
\ccsdesc[500]Theory of computation Formal languages and automata theory, \ccsdesc[500]Theory of computation Logic
category:
\relatedversion
1. Introduction
The subword relation (sometimes called scattered subword relation) is one of the simplest nontrivial examples of a well-quasi ordering [6]. This property allows its prominent use in the verification of infinite state systems [3]. The subword relation can be understood as embeddability of one word into another. This embeddability relation has been considered for other classes of structures like trees, posets, semilattices, lattices, graphs etc. [12, 14, 13, 7, 8, 9, 10, 19, 20].
In this paper, we study logical properties of a set of words ordered by the subword relation. We are mainly interested in general situations where we get a decidable logical theory. Regarding first-order logic, we already have a rather precise picture about the border between decidable and undecidable fragments: For the subword order alone, the -theory is decidable [15] and the -theory is undecidable [11]. For the subword order together with regular predicates, the two-variable theory is decidable [11] and the three-variable theory [11] as well as the -theory are undecidable [5] (these two undecidabilities already hold if we only consider singleton predicates, i.e., constants).
Thus, to get a decidable theory, one has to restrict the expressiveness of first-order logic considerably. For instance, neither in the -, nor in the two-variable fragment of first-order logic, one can express the cover relation (i.e., “ is a proper subword of and there is no word properly between these two”). As another example, one cannot express threshold properties like “there are at most subwords with a given property” in any of these two logics (for ).
In this paper, we refine the analysis of logical properties of the subword order in three aspects:
- •
We restrict the universe from the set of all words to a given language .
- •
Besides the subword order, we also consider the cover relation .
- •
We add threshold and modulo counting quantifiers to the logic.
Also as before, we may or may not add regular predicates or constants to the structure. In other words, we consider reducts of the structure
[TABLE]
with some language and fragments of the logic that extends first-order logic by threshold- and modulo-counting quantifiers.
In this spectrum, we identify four new cases of decidable theories:
- (1)
The -theory of the whole structure is decidable provided is bounded and context-free (Theorem 3.7). A rather special case of this result follows from [5, Theorem 4.1]: If and if only regular predicates of the form are used, then the -theory is decidable (with ). 2. (2)
The -theory (i.e., the 2-variable-fragment of the -theory) of the whole structure is decidable whenever is regular (Corollary 4.23). The decidability of the -theory without the cover relation is [11, Theorem 5.5]. 3. (3)
The -theory of the structure is decidable provided is regular (Theorem 5.1). For , this is [15, Prop. 2.2]. 4. (4)
The -theory of the structure is decidable provided is regular and almost every word from contains a non-negligible number of occurrences of every letter (see below for a precise definition, Theorem 6.3). Note that, by [5, Theorem 3.3], this theory is undecidable if .
Our first result is shown by an interpretation of the structure in . Four ingredients are essential here: Parikh’s theorem [16], the rationality of the subword relation [11], Nivat’s theorem characterising rational relations [2], and the decidability of the -theory of [1, 18, 4] (note that this decidability does not follow directly from Presburger’s result since in his logic, one cannot make statements like “the number of witnesses satisfying …is even”).
Our second result extends a result from [11] that shows decidability of the -theory of the structure . They provide a quantifier elimination procedure which relies on two facts:
- •
The class of regular languages is closed under images of rational relations.
- •
The proper subword relation and the incomparability relation are rational.
Here, we follow a similar proof strategy. But while that proof had to handle the existential quantifier, only, we also have to deal with counting quantifiers. This requires us to develop a theory of counting images under rational relations, e.g., the set of words such that there are at least two words in the regular language with in the rational relation . We show that the class of regular languages is closed under such counting images provided the rational relation is unambiguous, a proof that makes heavy use of weighted automata [17]. To apply this to the subword and the cover relation, this also requires us to show that the proper subword, the cover, and the incomparability relations are unambiguous rational.
Our third result extends the decidability of the -theory of from [15]. The main point there was to prove that every finite partial order can be embedded into if . This is certainly false if we restrict the universe, e.g., to . However, such bounded regular languages are already covered by the first result, so we only have to handle unbounded regular languages . In that case, we prove nontrivial combinatorial results regarding primitive words in regular languages and prefix-maximal subwords. These considerations then allow us to prove that, indeed, every finite partial order embeds into . Then, the decidability of the -theory follows as in [15].
Regarding our fourth result, we know from [11] that decidability of the -theory of does not hold for every regular . Therefore, we require that a certain fraction of the positions in a word carries the letter (for almost all words from the language and for all letters). This allows us to conclude that every finite partial order embeds into above each word. The second ingredient is that, for such languages, any -sentence is effectively equivalent to such a sentence where constants are only used to express that all variables take values above a certain word . These two properties together with some combinatorial arguments from the theory of well-quasi orders then yield the decidability.
In summary, we identify four classes of decidable theories related to the subword order. In this paper, we concentrate on these positive results, i.e., we did not try to find new undecidable theories. It would, in particular, be nice to understand what properties of the regular language determine the decidability of the -theory of the structure (it is undecidable for [5] and decidable for, e.g., by our third result). Another open question concerns the complexity of our decidability results.
2. Preliminaries
Throughout this paper, let be some alphabet. A word with is a subword of a word if there are words with . In that case, we write ; if, in addition, , then we write and call a proper subword of . If such that and there is no word with , then we say that is a cover of and write . This is equivalent to saying and where is the length of the word . If, for two words and , neither is a subword of nor vice versa, then the words and are incomparable and we write . For instance, , , and .
Let be a structure, i.e., is a set, is a relation of arity (for all ), and for all . Then, formulas of the logic are built from the atomic formulas
- •
for variables or constants and
- •
for and variables or constants
by the following formation rules:
- (1)
If and are formulas, then so are and . 2. (2)
If is a formula and a variable, then is a formula. 3. (3)
If is a formula, a variable, and , then is a formula. 4. (4)
If is a formula, a variable, and with , then is a formula.
We call a threshold counting quantifier and a modulo counting quantifier. The semantics of these quantifiers is defined as follows:
- •
iff
- •
iff
For instance, expresses that the number of elements of the structure satisfying is even. Then \bigl{(}\exists^{0\bmod 2}x\,\alpha\bigr{)}\lor\bigl{(}\exists^{1\bmod 2}x\,\alpha\bigr{)} holds iff only finitely many elements of the structure satisfy . The fragment of comprises all formulas not containing any threshold counting quantifier . First-order logic is the set of formulas from not mentioning any counting quantifier, i.e., neither nor . Let denote the set of first-order formulas of the form where is quantifier-free; these formulas are also called existential.
Note that the formulas
[TABLE]
and
[TABLE]
are equivalent (they both express that the structure contains at least elements). Generalising this, the threshold quantifier can be expressed using the existential quantifier, only. Consequently, the logics and are equally expressive. The situation changes when we restrict the number of variables that can be used in a formula: Let and denote the set of formulas from and , respectively, that use the variables and , only. Note that the formula from (1) belongs to , but the equivalent formula from (2) does not belong to .
Remark 2.1**.**
Let be a set with two elements and let be a set with elements (where ). Furthermore, let be a formula such that all moduli appearing in are divisors of . By induction on the construction of the formula , one can show the following for any and :
- •
If and , then .
- •
.
Consequently, there is no -formula expressing that the number of elements of a structure is .
In this paper, we will consider the following structures:
- •
The largest one is for some . The universe of this structure is the language , we have two binary predicates ( and ), a unary predicate for every regular language , and we can use every word from as a constant.
- •
The other extreme is the structure for some where we consider only the binary predicate .
- •
Finally, we will also prove results on the intermediate structure that has a binary relation and any word from the language as a constant.
For any structure and any of the above logics , we call
[TABLE]
the -theory of .
A language is bounded if there are a number and words such that . Otherwise, it is unbounded.
For an alphabet , a word , and a letter , let denote the number of occurrences of the letter in the word . The Parikh vector of is the tuple . Note that is a homomorphism from the free monoid onto the additive monoid .
3. The -theory with regular predicates
The aim of this section is to prove that the full -theory of the structure
[TABLE]
is decidable for bounded and context-free. This is achieved by interpreting this structure in , i.e., in Presburger arithmetic whose -theory is known to be decidable [1, 18, 4].
We start with three preparatory lemmas.
Lemma 3.1**.**
Let be context-free, , and be defined by for all . The set is effectively semilinear.
Proof 3.2**.**
Let be an alphabet and define the monoid homomorphism by for all .
Since the class of context-free languages is effectively closed under inverse homomorphisms, the language
[TABLE]
is effectively context-free. Since is regular, also the language
[TABLE]
is effectively context-free. By Parikh’s theorem [16], the Parikh-image of this intersection is effectively semilinear.
Now let . Then iff there exists a word with Parikh image . But the only word from with this Parikh image is , i.e., iff . Since , this is equivalent to . Thus, the semilinear set equals the set from the lemma.
Lemma 3.3**.**
Let and be defined by for all . The set is semilinear.
Proof 3.4**.**
Let be an alphabet and define the monoid homomorphism by for all .
In this prove, we construct an alphabet and homomorphisms , , , , and , such that the diagrams (for ) from Fig. 1 commute. In addition, we will construct a regular language with
[TABLE]
for all .
The subword relation
[TABLE]
on is rational [11]. Since the class of rational relations is closed under inverse homomorphisms [2], also the relation
[TABLE]
is rational. While the class of rational relations is not closed under intersections, it is at least closed under intersections with direct products of regular languages. Hence also
[TABLE]
is rational. By Nivat’s theorem [2], there are a regular language over some alphabet and two homomorphisms with
[TABLE]
Since is regular, its Parikh-image
[TABLE]
is semilinear [16].
Note that the additive monoid is the commutative monoid freely generated by the vectors for . Since also is commutative, we can define monoid homomorphisms by and for . Then also (p_{1},p_{2})\colon\mathbb{N}^{\Delta}\to\mathbb{N}^{n}\times\mathbb{N}^{n}\colon\overline{x}\mapsto\bigl{(}p_{1}(\overline{x}),p_{2}(\overline{x})\bigr{)} is a monoid homomorphism.
Let with for all . Then we have
[TABLE]
and similarly .
Since the class of semilinear sets is closed under monoid homomorphisms, the image of the semilinear set under , i.e.,
[TABLE]
is semilinear.
Let . Then iff there exists with and . The existence of such a word is equivalent to the existence of a pair with and .
Since all words appearing in belong to , this last statement is equivalent to saying
[TABLE]
But this is equivalent to saying
[TABLE]
Note that and similarly . Thus, the last claim is equivalent to .
In summary, we showed that the semilinear set is the set from the lemma.
Lemma 3.5**.**
Let , be context-free, and be defined by for every tuple . Then there exists a semilinear set such that maps bijectively onto .
Proof 3.6**.**
By Lemma 3.1, the set is semilinear and satisfies, by its definition, .
Let denote the semilinear set from Lemma 3.3.
Then let denote the set of -tuples such that the following holds for all :
If and , then is lexicographically smaller than or equal to .
This set is semilinear since the class of semilinear relations is closed under first-order definitions. Now let . Since maps onto , there is with . Since, on , the lexicographic order is a well-order, there is a lexicographically minimal such tuple . This tuple belongs to and it is the only tuple from mapped to .
Now we can prove the main result of this section.
Theorem 3.7**.**
Let be context-free and bounded. Then the -theory of is decidable.
Proof 3.8**.**
Since is bounded, there are words such that . For an -tuple we define .
- (1)
By Lemma 3.5, there is a semilinear set that is mapped by bijectively onto .
From this semilinear set, we obtain a first-order formula in the language of such that, for any , we have . 2. (2)
The set is semilinear by Lemma 3.3.
From this semilinear set, we obtain a first-order formula in the language of such that . 3. (3)
For any regular language the set is effectively semilinear by Lemma 3.1.
From this semilinear set, we can compute a first-order formula in the language of such that .
We now define, from an -formula in the language of , an -formula in the language of such that
[TABLE]
If , then . If , then . Furthermore, and .
For , we set .
Finally, let . Intuitively, one is tempted to set , but this is not a valid formula since is not a single variable, but a tuple of variables. To rectify this, we define -formulas for and as follows:
[TABLE]
where the disjunction extends over all functions with .
By induction, one obtains
[TABLE]
iff
[TABLE]
Recall that maps the tuples satisfying bijectively onto . Hence, the above is equivalent to
[TABLE]
Setting therefore solves the problem.
Consequently, any sentence from in the language of is translated into an equivalent sentence in the language of . By [1, 18, 4], validity of the sentence in is decidable.
4. The 2-theory with regular predicates
By [11], the -theory of is decidable. This two-variable fragment of first-order logic has a restricted expressive power since, e.g., the following two properties cannot be expressed:
- (1)
. 2. (2)
.
To make the first property accessible, we add the cover relation to the structure. The logic allows to express the second property with only two variables by (in addition, it can express that the regular language contains an even number of elements which is not expressible in first-order logic at all).
It is the aim of this section to show that the -theory of the structure
[TABLE]
is decidable. This decidability proof extends the proof from [11] for the decidability of the -theory of . That proof provides a quantifier-elimination procedure that relies on two facts, namely
- (1)
that the class of regular languages is closed under images under rational relations and 2. (2)
that the proper subword relation and the incomparablity relation are rational.
Similarly, our more general result also provides a quantifier-elimination procedure that relies on the following extensions of these two properties:
- (1)
The class of regular languages is closed under counting images under unambiguous rational relations (Section 4.2) and 2. (2)
the proper subword, the cover, and the incomparability relation are unambiguous rational (Section 4.1).
The actual quantifier-elimination is then presented in Section 4.3.
4.1. Unambiguous rational relations
Recall that, by Nivat’s theorem [2], a relation is rational if there exist an alphabet , a homomorphism , and a regular language such that maps surjectively onto . We call unambiguous rational relation if, in addition, maps injectively (and therefore bijectively) onto .
Example 4.1**.**
The relations and are unambiguous rational: take , and the homomorphisms
[TABLE]
(for the first relation) and
[TABLE]
Note that the intersection is not even rational, while the union is rational (since the union of rational language is always rational) [2]. But this union is not unambiguous rational: If it were unambiguous rational, then the set
[TABLE]
would be regular by Prop. 4.17 below.
Lemma 4.2**.**
Let be unambiguous rational and disjoint. Then is unambiguous rational.
Proof 4.3**.**
There are disjoint alphabets and , regular languages and homomorphisms such that is mapped bijectively onto . Let and . Let the homomorphism be given by
[TABLE]
for . Then maps the regular language bijectively onto .
Lemma 4.4**.**
For any alphabet , the cover relation and the relation are unambiguous rational.
Proof 4.5**.**
For , let and . Furthermore, let the homomorphism be defined by and for all . Finally, let the homomorphism be defined by .
Now consider the regular language
[TABLE]
Let . Since any occurrence of a letter in is immediately preceeded by an occurrence of , we get .
Conversely, if , then can be written (uniquely) as such that does not occur in (for all ). For let be the unique word from with . Then
[TABLE]
belongs to and satisfies .
Since the factorization of is unique, we have that maps bijectively onto the subword relation .
- •
Let denote the intersection of with \bigl{(}\Sigma_{2}\Sigma_{1}\bigr{)}^{*}\,\Sigma_{2}\,\bigl{(}\Sigma_{2}\Sigma_{1}\bigr{)}^{*}, i.e., the regular language of words from with precisely one more occurrence of letters from than from . Then is mapped bijectively onto the relation , hence this relation is unambiguous rational.
- •
Similarly, let denote the regular language of all words from with at least two more occurrences of letters from than from . It is mapped bijectively onto the relation . Hence this relation is unambiguous rational.
Lemma 4.6**.**
For any alphabet , the incomparability relation
[TABLE]
is unambiguous rational.
Proof 4.7**.**
Note that the set is the disjoint union of the following three relations:
- (1)
, 2. (2)
, and 3. (3)
.
As in the previous proof, let and . Furthermore, let the homomorphism be defined by and for all .
We prove that the relations , , and are all unambiguous rational. From Lemma 4.2, we then get that is unambiguous rational since it is the disjoint union of these three relations.
We start with the simple case: . Consider the regular language
[TABLE]
This is the set of sequences of words of the form such that, at least once, . Hence, maps the regular language bijectively onto .
Next, we handle the relation . Correcting [11, Lemma 5.2] slightly, we learn that if, and only if,
- •
* for some , , , and*
- •
* for some word with .*
Furthermore, the number , the letters , and the words and are unique. Define
[TABLE]
By the above characterisation of , the homomorphism maps bijectively onto .
Recall that is mapped bijectively onto . Hence maps bijectively onto .
Since the class of unambiguous rational relations is closed under inverses, also is unambiguous rational.
4.2. Closure properties of the class of regular languages
Let be an unambiguous rational relation and a regular language. We want to show that the languages of all words with
[TABLE]
are effectively regular for all and all , respectively.
Example 4.8**.**
Consider the rational relation and the regular language . With , the language (3) equals the non-regular set . Thus, to prove effective regularity, we need to restrict the rational relation .
For these proofs, we need the following classical concepts. Let be a semiring. A function is realizable over , if there are , , a homomorphism , and with for all .111In the literature, a realizable function is often called recognizable formal power series. Since, in this paper, we will not encounter any operations on formal power series (like addition, Cauchy product etc), we use the (in this context) more intuitive notion of a “realizable function”. The triple is a presentation or a weighted automaton for .
In the following, we consider the semiring , i.e., the set together with the commutative operations and (with for all , for all , and ). On this set, we define (in a natural way) an infinite sum setting
[TABLE]
for any family with entries in .
Our first aim in this section is to prove the following
Proposition 4.9**.**
Let and be alphabets, a homomorphism, and a realizable function over . Then the function
[TABLE]
is effectively realizable over .
Before we can do this in all generality, we first consider two special cases: A monoid homomorphism between free monoids is non-expanding if for all , i.e. for all . It is non-erasing if, dually, for all , i.e., for all .
Lemma 4.10**.**
Let and be alphabets, a non-expanding homomorphism, and a realizable function over . Then the function
[TABLE]
is effectively realizable over .
Proof 4.11**.**
Let be a presentation of dimension for .
For , let
[TABLE]
Since is non-expanding, is the disjoint union of these subalphabets. Furthermore, let be the matrix defined by
[TABLE]
for all .
To define a presentation for the function , we first define a homomorphism \mu^{\prime}\colon\Sigma^{*}\to\bigl{(}\mathbb{N}^{\infty}\bigr{)}^{n\times n} by
[TABLE]
for all . Setting
[TABLE]
defines the presentation of dimension . Now let with for all . Then we get
[TABLE]
Hence, is a presentation for the function , i.e., is realizable.
It remains to be shown that the presentation is computable from the presentation and the homomorphism . For this, it suffices to construct the matrix effectively, i.e., to compute the infinite sum in Eq. (5). Using a pumping argument, one first shows the equivalence of the following two statements for all :
- (a)
There are infinitely many words with . 2. (b)
There is a word with and .
Since statement (b) is decidable, we can evaluate Eq. (5) calculating
[TABLE]
Lemma 4.12**.**
Let and be alphabets, a non-erasing homomorphism, and a realizable function over . Then the function
[TABLE]
is effectively realizable over .
Proof 4.13**.**
Since is realizable, it can be constructed from functions with for at most one using addition, Cauchy-product, and iteration applied to functions with [17, Theorem 3.11]. Replacing, in this construction, the basic function by with
[TABLE]
yields a construction of . Since is non-erasing, also in this construction, iteration is only applied to functions with . Hence, by [17, Theorem 3.1] again, is realizable. Analysing the proof of that theorem, one even obtains that a presentation for can be computed from and a presentation of .
Proof 4.14** (Proof of Prop. 4.9).**
Let be arbitrary. Then define homomorphisms and by
[TABLE]
for all letters . Then is non-expanding, is non-erasing, and .
By Lemma 4.10, the function is effectively realizable. By Lemma 4.12, also is effectively realizable.
Lemma 4.15**.**
Let be an unambiguous rational relation and be regular. Then the function
[TABLE]
is effectively realizable.
Proof 4.16**.**
Since is unambiguous rational, there are an alphabet , homomorphisms , and a regular language such that
[TABLE]
maps bijectively onto the relation . Let . Since is regular, the language is effectively regular. Furthermore, maps bijectively onto .
Since is regular, the characteristic function
[TABLE]
is effectively realizable over [17, Proposition 3.12].
By Proposition 4.9, also the function
[TABLE]
is effectively realizable over . Note that, for , we get
[TABLE]
In other words, the effectively realizable function is the function from the statement of the lemma.
Proposition 4.17**.**
Let be an unambiguous rational relation and be regular.
- (1)
For , the set of words with
[TABLE]
is effectively regular. 2. (2)
For with , the set of words with
[TABLE]
is effectively regular.
Proof 4.18**.**
By Lemma 4.15, the function is effectively realizable over .
We construct a semiring setting
[TABLE]
for (note that is equivalent to ).
Then the mapping
[TABLE]
is a semiring homomorphism. It follows that the function
[TABLE]
is effectively realizable over [17, Prop. 4.5]. Since the semiring is finite, the language
[TABLE]
is effectively regular [17, Prop. 6.3]. Since , this proves the first statement.
The second statement is shown similarly. We construct the semiring with
[TABLE]
for . Note that, with , we get and similarly .222The reader might wonder why both, [math] and , belong to . Suppose we identified them, i.e., considered . Then we would get .
Let denote the semiring homomorphism from to with
[TABLE]
It follows that the function
[TABLE]
is effectively realizable over [17, Prop. 4.5]. Since the semiring is finite, the language
[TABLE]
is effectively regular for all [17, Prop. 6.3]. Then the claim follows since
[TABLE]
4.3. Quantifier elimination for
Our decision procedure employs a quantifier alternation procedure, i.e., we will transform an arbitrary formula into an equivalent one that is quantifer-free. As usual, the heart of this procedure handles formulas where is a quantifier and is quantifier-free. Since the logic has only two variables, any such formula has at most one free variable. In other words, it defines a language . The following lemma shows that this language is effectively regular, such that is equivalent to the quantifier-free formula .
Lemma 4.19**.**
Let be a quantifier-free formula from . Then the sets
[TABLE]
are effectively regular for all and all with .
Proof 4.20**.**
Without changing the meaning of the formula , we can do the following replacements of atomic formulas:
- •
* can be replaced by ,*
- •
* and by , and*
- •
* and by .*
Since is quantifier-free, we can therefore assume that it is a Boolean combination of formulas of the form
- •
* for some regular language ,*
- •
* for some regular language ,*
- •
,
- •
,
- •
, and
- •
.
We define the following formulas for :
[TABLE]
Note that any pair of words and satisfies precisely one of these six formulas. Hence is equivalent to
[TABLE]
In this formula, any occurrence of appears in conjunction with precisely one of the formulas . Depending on this formula , we can simplify to by replacing the atomic subformulas that compare and as follows:
- •
If , we replace by the valid formula .
- •
If , we replace by .
- •
If , we replace by .
- •
If , we replace by .
All remaining comparisions are replaced by .
As a result, the formula is equivalent to
[TABLE]
where the formulas are Boolean combinations of formulas of the form and for some regular languages and .
Now let . Since the formulas are mutually exclusive (i.e., is satisfiable iff ), we get
[TABLE]
where the disjunction extends over all tuples of natural numbers with .
Hence it suffices to show that
[TABLE]
is effectively regular for all , all , and all Boolean combinations of formulas of the form and where and are regular languages. Since the class of regular languages is closed under Boolean operations, we can find regular languages and such that is equivalent to
[TABLE]
Note that this formula is equivalent to
[TABLE]
Since this disjunction is exclusive (i.e. any pair of words satisfies at most one of the cases), the set from (6) equals the union of the sets
[TABLE]
for . Observe that for , this set equals and we are done. So let us assume from now on. Note that in that case, the set from (7) equals
[TABLE]
This set, in turn, equals
- •
* if and ,*
- •
* if and ,*
- •
* if ,*
- •
* if ,*
- •
* if ,*
- •
* if , and*
- •
* if .*
In any case, it is effectively regular by Prop. 4.17, Lemma 4.4, and Lemma 4.6. Since the language from the claim of the lemma is a Boolean combination of such languages, the first claim is demonstrated.
To also demonstrate the regularity of the second language, let with . Then is equivalent to the disjunction of all formulas of the form
[TABLE]
where is a tuple of natural numbers from with . The rest of the proof proceeds mutatis mutandis.
Theorem 4.21**.**
Let . Let be a formula from . Then the set
[TABLE]
is effectively regular.
Proof 4.22**.**
The claim is trivial if is atomic. For more complicated formulas, the proof proceeds by induction using Lemma 4.19 and the effective closure of the class of regular languages under Boolean operations.
Corollary 4.23**.**
Let be a regular language. Then the -theory of the structure is decidable.
Proof 4.24**.**
Let be a sentence. By the previous theorem, the set
[TABLE]
is regular. Hence holds iff this set is nonempty, which is decidable.
5. The -theory
Let be regular and bounded. Then, by Theorem 3.7, we obtain in particular that the -theory of is decidable. Note that the regular language is not covered by this result since it is unbounded. And, indeed, the -theory of is undecidable [5]. On the positive side, we know that the -theory of is decidable [15].
In this section, we generalize this positive result to arbitrary regular languages, i.e., we prove the following result:
Theorem 5.1**.**
Let be regular. Then the -theory of is decidable.
The proof for the case in [15] essentially relies on the fact that each order , and thus every finite partial order, embeds into .
In the general case here, the situation is more involved. Take, for example, . Then, orders as simple as do not embed into : This is because the downward closure of any infinite subset of contains all of , but contains a downwards closed infinite chain. Nevertheless, we will show, perhaps surprisingly, that every finite partial order embeds into . In fact, this holds whenever is an unbounded regular language. The latter requires two propositions that we shall prove only later. Recall that a word is called primitive if there is no with .
Proof 5.2** (Proof of Theorem 5.1).**
By Theorem 3.7, we may assume that is unbounded.
By Proposition 5.3 below, there are words with , primitive, and . By Proposition 5.11 below, any finite partial order embeds into and therefore into which is a substructure of , i.e., every finite partial order embeds into .
Hence with quantifier-free holds in iff it holds in some finite partial order whose size can be bounded by . Since there are only finitely many such partial orders, the result follows.
The first proposition used in the above proof deals with the existence of certain primitive words for every unbounded regular language.
Proposition 5.3**.**
For every unbounded regular language , there are words so that
- (1)
, 2. (2)
the word is primitive, and 3. (3)
.
Proof 5.4**.**
Since is unbounded and regular, there are words with , , and . Set and .
Then and . Suppose and are conjugate. Since , this implies with , i.e., is the square of some word of length . But this contradicts and . Hence and are not conjugate.
Next let , and .
By contradiction, we show that is primitive.
Since we assume not to be primitive, there is a word with . Observe that there is a such that : If , we can choose since and if , we can take .
Observe that and are prefixes of of length and , respectively. Hence is a prefix of .
On the other hand, and are suffixes of of length and , respectively. Hence is a suffix of .
Taking these two facts together, we obtain that is a factor of . Since and are not conjugate, this implies which contradicts .
The second proposition used above talks about the embeddability of every finite partial order into certain regular languages of the form where the words and originate from the previous proposition. The proof of this embeddability requires a good deal of preparation that deals with the combinatorics of subwords, more precisely with the properties of “prefix-maximal subwords”.
Let and with . An embedding of into is a mapping with and for all . Note that iff there exists an embedding of into . This embedding is called initial if , i.e., if the left-most position in hits the left-most position in . Symmetrically, the embedding is terminal if , i.e., if the right-most position in hits the right-most position in .
We write if and every embedding of into is terminal. This is equivalent to saying that , but no word with is a subword of . In other words, if is a prefix-maximal subword of .
Lemma 5.5**.**
Let be primitive and . Then, every embedding of into is either initial or terminal.
Proof 5.6**.**
Let be an embedding of into that is neither initial nor terminal. Consider the copies of in the word . We call such a copy gapless if its image in under is contiguous. Since the length difference between and is only , there has to be at least one gapless copy of , say the th copy. The image of this copy is a contiguous subword of that spells and occurs at some position with . If , then is initial and if , then is terminal. This means . However, since is primitive, it can occur as a contiguous subword in only at positions that are divisible by , which is a contradiction.
Lemma 5.7**.**
The ordering is multiplicative: If with and , then .
Proof 5.8**.**
Suppose . Since , there is such that . Then, either where is the first letter of (contradicting ), or (contradicting ).
Lemma 5.9**.**
Let be words such that and is primitive. Then, for all with , we have
- (i)
, 2. (ii)
, and 3. (iii)
.
Proof 5.10**.**
For claim (i), suppose is an embedding of into . Then induces an embedding of into . Note that cannot be initial because otherwise would embed into . Thus, is terminal by Lemma 5.5. Hence, is terminal.
For claim (ii), suppose is an embedding of into . Since , induces an embedding of into . Again, cannot be initial because otherwise would embed into . Therefore, is terminal according to Lemma 5.5, meaning that is terminal as well.
Finally, for claim (iii), suppose is an embedding of into . Since , induces an embedding of into . Again, cannot be initial because otherwise, would embed into , but these are distinct words of equal length. Thus, Lemma 5.5 tells us that must be terminal and hence also .
Proposition 5.11**.**
Let be distinct such that is primitive and . Then every finite partial order embeds into .
Proof 5.12**.**
For , let denote the componentwise order on the set of -dimensional vectors over . Note that every finite partial order with elements embeds into . Hence, it suffices to embed this partial order into .
We define the map as follows. Set . Then, for a tuple , let
[TABLE]
It is clear that for , implies .
Now let and be two vectors from with . Towards a contradiction, suppose . Then there is an with , and for all . Since by Lemma 5.9(i) and clearly also , we have for every . Furthermore, since and , we have according to Lemma 5.9(ii) with . Therefore, Lemma 5.7(i) yields
[TABLE]
We show by induction on that for every , there is an with
[TABLE]
Of course, (8) is the base case. So let and such that (9) holds. We distinguish three cases.
- (1)
Suppose . Then
[TABLE]
since either and the two words are the same, or and by Lemma 5.9(i). So together with the induction hypothesis (9), Lemma 5.7(i) yields
[TABLE]
where the equality is due to . Since , this proves (9) for . 2. (2)
Suppose and . By Lemma 5.9(ii), we have
[TABLE]
So together with the induction hypothesis (9), Lemma 5.7(i) implies
[TABLE]
where the second equality is due to . Since , this proves (9) for . 3. (3)
If and , then Lemma 5.9(iii) tells us that
[TABLE]
So together with the induction hypothesis (9), Lemma 5.7(i) implies
[TABLE]
Since , this proves (9) for .
This completes the induction. Therefore, we have in particular
[TABLE]
for some . Since the left-hand side is a proper prefix of , this contradicts .
This completes the proof of the main result of this section, i.e., of Theorem 5.1.
6. The -theory with constants
By Theorem 5.1, the -theory of is decidable for all regular languages . If is bounded, then even the -theory of is decidable (Theorem 3.7). This result does not extend to all regular languages since, e.g., the -theory of is undecidable [5]. In this section, we present another class of regular languages (besides the bounded ones) such that
[TABLE]
has a decidable -theory.
Let be some language. Then almost all words from have a non-negligible number of occurrences of every letter if there exists a positive real number such that for all and all but finitely many words , we have
[TABLE]
An example of such a regular language is (this class contains all finite languages, is closed under union and concatenation and under iteration, provided every word of the iterated language contains every letter).
For , let denote the set of superwords of , i.e., the upward closure of in .
The basic idea is, as in the proof of Theorem 5.1, to embed every finite partial order into . The following lemma refines this embedability. Furthermore, it shows that is finite in this case.
Lemma 6.1**.**
Let be an unbounded regular language such that almost all words from have a non-negligible number of occurrences of every letter. Let . Then every finite partial order can be embedded into . Furthermore, the set is finite.
Note that is regular, but not necessarily unbounded (it could even be finite). Hence the first claim is not an obvious consequence of Propositions 5.11 and 5.3.
Proof 6.2**.**
Since is regular and unbounded, there are words with , primitive, and (by Proposition 5.3). In particular, . Let and suppose . Then
[TABLE]
contradicting that almost all words from have a non-negligible number of occurrences of every letter. Hence, contains every letter from implying . Set . Then we have . From Proposition 5.11, we learn that can be embedded into . Hence, it can be embedded into and therefore into .
Next, we show that is finite. Let be the minimal deterministic finite automaton accepting . Let with . Then we can factorize the word into such that for all . With , we obtain for all and therefore . Since almost all words from have a non-negligible number of occurrences of every letter, this implies (as above) that contains all letters from . Since this holds for all , we obtain and therefore . Hence, indeed, is finite.
Theorem 6.3**.**
Let be an unbounded regular language such that almost all words from have a non-negligible number of occurrences of every letter. Then the -theory of is decidable.
Proof 6.4**.**
We want to show that satisfiability in is decidable for quantifier-free formulas, i.e., for positive Boolean combinations of literals of the following forms (where and are arbitrary variables and an arbitrary word from ):
- (i)
** 2. (ii)
**
- (iii)
** 2. (iv)
**
- (v)
** 2. (vi)
**
Note that literals of the form can be written as , as , and similarly as . Furthermore, literals mentioning two words like can be replaced by or . By bringing the formula in disjunctive normal form, we may assume that we are given a disjunction of conjunctions of such literals.
Step 1.* We first show that literals of types (i) and (iv) can be eliminated. To this end, observe that for each , both of the sets*
[TABLE]
are finite. In the case , this is trivial. In the case of , this is the second claim in Lemma 6.1. Thus, every conjunction that contains a literal or , constrains to finitely many values. Therefore, we can replace this conjunction with a disjunction of conjunctions that result from replacing by one of these values. (Here, we might obtain literals or , but those can be replaced by and as above).
Note that such a replacement reduces the number of variables by one. We repeat this replacement until there are no more literals of the form (i) and (iv). Since we replace each conjunction with (a disjunction of) conjunctions that have fewer variable, this has to terminate. Thus, we arrive at a disjunction of conjunctions of literals of the forms (ii),(iii),(v), and (vi).
Step 2.* In the second step, we will eliminate literals of the form (ii). Note that the language is upward closed in . Since is regular, we can compute the finite set of minimal elements of this set. Thus, is equivalent to a finite disjunction of literals of the form . As a result, we get a positive Boolean combination of literals of the form (iii), (v), (vi) that is equivalent to .*
Step 3.* In the third step, we check whether our formula is satisfiable. We may assume that is in disjunctive normal form. To verify whether is satisfiable in , it therefore suffices to verify satisfiability of conjunctions of literals of the form (iii), (v), (vi). So let be such a conjunction. It can be written as where is a conjunction of literals of the form (iii) and is a conjunction of literals of the form (v) and (vi).*
*Let denote the number of variables appearing in . If is satisfiable in , then is satisfied by some partial order with at most elements. Conversely, let be satisfied by where has at most elements. Let, furthermore, denote some concatenation of all words appearing in the formula . By the first claim of Lemma 6.1, the finite partial order can be embedded into . Consequently, is satisfiable in and therefore in . In summary, is satifiable in iff holds in some finite partial order of size at most . Since there are only finitely many such finite partial orders, we get that satisfiability of in is decidable. *
Open questions
We did not consider complexity issues. In particular, from [11], we know that the -theory of the structure can be decided in elementary time. We currently work out the details for the extension of this result to the -theory of the structure for regular. We reduced the -theory of the full structure (for context-free and bounded) to the -theory of which is known to be decidable in elementary time [4]. Unfortunately, our reduction increases the formula exponentially due to the need of handling statements of the form “there is an even number of pairs such that …” It should be checked whether the proof from [4] can be extended to handle such statements in for directly.
Finally, we did not give any new undecidability results. For example, we know that the -theory of is undecidable for [5] and decidable for (Theorem 6.3). To narrow the gap between decidable and undecidable cases, one should find more undecidable cases.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Apelt. Axiomatische Untersuchungen über einige mit der Presburgerschen Arithmetik verwandten Systeme. Z. Math. Logik Grundlagen Math. , 12:131–168, 1966.
- 2[2] J. Berstel. Transductions and context-free languages . Teubner Studienbücher, Stuttgart, 1979.
- 3[3] A. Finkel and Ph. Schnoebelen. Well-structured transition systems everywhere! Theoretical Computer Science , 256:63–92, 2001.
- 4[4] P. Habermehl and D. Kuske. On Presburger arithmetic extended with modulo counting quantifiers. In Fo S Sa CS’15 , Lecture Notes in Comp. Science vol. 9034, pages 375–389. Springer, 2015.
- 5[5] S. Halfon, Ph. Schnoebelen, and G. Zetzsche. Decidability, complexity, and expressiveness of first-order logic over the subword ordering. In Proc. of the Thirty-Second Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2017) , pages 1–12. IEEE Computer Society, 2017.
- 6[6] G. Higman. Ordering by divisibility in abstract algebras. Proc. London Math. Soc. , 2:326–336, 1952.
- 7[7] J. Ježek and R. Mc Kenzie. Definability in substructure orderings. I: Finite semilattices. Algebra Univers. , 61(1):59–75, 2009.
- 8[8] J. Ježek and R. Mc Kenzie. Definability in substructure orderings. III: Finite distributive lattices. Algebra Univers. , 61(3-4):283–300, 2009.
