The number of languages with maximum state complexity
Bj{\o}rn Kjos-Hanssen, Lei Liu

TL;DR
This paper provides a formula for counting the number of finite languages with maximum state complexity and generalizes the concept from languages to functions on finite sets.
Contribution
It introduces a formula for the number of maximum-complexity languages and extends the analysis from languages to functions on finite sets.
Findings
Derived a formula for counting maximum-complexity languages.
Generalized the maximum complexity analysis from languages to functions.
Enhanced understanding of the distribution of maximum-complexity languages.
Abstract
C\^{a}mpeanu and Ho (2004) determined the maximum finite state complexity of finite languages, building on work of Champarnaud and Pin (1989). They stated that it is very difficult to determine the number of maximum-complexity languages. Here we give a formula for this number. We also generalize their work from languages to functions on finite sets.
| 0 | 1 | |
| 0.664 | ||
| 0.685 | ||
| 0.854 | ||
| 0.971 | ||
| 0.961 | ||
| 0.927 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · Coding theory and cryptography · Advanced Combinatorial Mathematics
The number of languages with
maximum state complexity
Bjørn Kjos-Hanssen
Lei Liu This work was partially supported by grants from the Simons Foundation (#315188 and #704836 to Bjørn Kjos-Hanssen) and Decision Research Corporation (University of Hawai‘i Foundation Account #129-4770-4). We are grateful to the gracious referee who persisted through seven revisions of the paper.
Abstract
Câmpeanu and Ho (2004) determined the maximum finite state complexity of finite languages, building on work of Champarnaud and Pin (1989). They stated that it is very difficult to determine the number of maximum-complexity languages. Here we give a formula for this number. We also generalize their work from languages to functions on finite sets.
1 Introduction
At some point in the 1980s, Howard Straubing posed a problem that was subsequently solved in Champarnaud and Pin (1989) [2]. They showed that the minimal incomplete deterministic finite automaton of a language , where , has at most
[TABLE]
states. Moreover, for each there exists an attaining this bound. Câmpeanu and Ho (2004) [1] showed more generally that the tight upper bound for of cardinality and for complete automata is
[TABLE]
where . (In these results, requiring totality of the transition function adds 1 to the state count.) Câmpeanu and Ho’s result can be viewed as concerning functions where is a set of cardinality . We generalize their result to arbitrary functions where is a positive integer. Equivalently, we consider functions , where for some , and where automata have accept states corresponding to nonzero values of .
The function on may seem rather complicated as functions on that set go. On the other hand, mod 5 is less so, in that we can decompose it as , so that after seeing and , we need not remember the pair , but only their sum. Out of the ternary functions on a 5-element set, at most can be decomposed as for some binary functions , . This idea of the state complexity of functions has been applied in bioinformatics [5]. In Section 2 we make precise a sense in which such functions are not the most complex ternary functions. We do this by extending a result of Câmpeanu and Ho [1] to functions taking values in a set of size larger than two. Rising to an implicit challenge posed by Câmpeanu and Ho, we give a formula for the number of maximally complex languages.
The structure of the paper is as follows. In Section 2 we obtain an upper bound in Theorem 2.14 for the complexity of a function , and a matching lower bound in Theorem 2.18. In Section 3 we obtain the number of maximal complexity functions in Theorem 3.10. Then we look at asymptotics in Section 4, culminating in Theorem 4.12.
2 Complexity of languages and operations
Let denote the empty word. Let the cardinality of a finite set be denoted by , and the length of a finite word by . We define a function for any sets with by
[TABLE]
Definition 2.1**.**
Let and be positive integers and let be an alphabet with . An incomplete deterministic finite automaton (IDFA) is a 5-tuple , where is a finite set of states, is a finite alphabet, is the start state, is the set of accept states, and , where , is the transition function.
W also require , where . If , i.e., is total, then is moreover a deterministic finite automaton (DFA).
We define , where , by , and recursively for and . We say that states are -distinguishable if there is a with and .
The function accepted by is the function defined by
[TABLE]
and otherwise. Thus if , and if . The language accepted by is
[TABLE]
Note that in the case , accepting a language is equivalent to accepting its indicator (characteristic) function.
Definition 2.2** (state complexity).**
We call an IDFA minimal (for ) if for all IDFAs with . Moreover, is minimal for if accepts and for all accepting . In this case we define the state complexity by .
Champarnaud and Pin [2] obtained the following result.
Theorem 2.3** ([2, Theorem 4]).**
A minimal IDFA for a language has at most
[TABLE]
states, and for each there exists a language attaining this bound.
Theorem 2.3 was generalized by Câmpeanu and Ho [1]:
Theorem 2.4** ([1, Corollary 10]).**
Let and be integers, and let be a minimal DFA for a language . Let be the set of states of . Then we have:
- (i)
, where . 2. (ii)
There is an such that the upper bound given by Item i is attained.
Both of these results involve an upper bound which can be viewed as a special case of Theorem 2.14 below.
We now develop a function version of the Myhill–Nerode theorem, by following and generalizing the presentation in Shallit’s textbook [6].
Definition 2.5**.**
Let be an alphabet and let . A relation is right invariant if for all , we have . An equivalence relation on is a congruence relation for if for all , For an equivalence relation , the index of , denoted , is the number of equivalence classes of . An equivalence relation has finite index if . The Myhill–Nerode equivalence relation for is the relation defined by
[TABLE]
Let denote the -equivalence class of .
Lemma 2.6**.**
Let .
* is an equivalence relation.* 2. 2.
* right invariant.*
Proof.
Item 1 is a standard observation. For Item 2: If we extend and by the same string , then we have also extended and by the same string , and hence . ∎
Lemma 2.7**.**
Let . Suppose that is a right invariant equivalence relation on which is a congruence relation for . Then is a refinement of .
Proof.
We must show that . Suppose and let . Since is right invariant, . Since is a congruence relation for , . Thus we have shown that . ∎
Every function is onto its range, and when the range is a finite subset of , when studying complexity under our definitions we assume the range is an initial segment of . Thus we restrict attention to onto functions in Theorem 2.8.
Theorem 2.8**.**
Let be onto. The following are equivalent:
* is accepted by some IDFA.* 2. 2.
There exists a right invariant congruence relation for of finite index. 3. 3.
* has finite index.* 4. 4.
* is accepted by some DFA.*
Proof.
We prove this in the usual round-robin fashion.
(1) (2):
Let be an IDFA that accepts . Define a relation by iff , or both are undefined. Since has finitely many states, has finite index. From the definition of it follows that is right invariant. Finally, since if defined, and 0 otherwise, is determined by . Thus is a congruence relation for .
(2) (3):
Let be a right invariant congruence relation for , of finite index. By Lemma 2.7, is a refinement of . Then , as desired.
(3) (4):
Suppose has finite index. Define , , , and . Then . Since is right invariant, is well-defined. Thus is an IDFA. We must show that for each . Case 1: . Since is a congruence relation for , and hence which means that . Case 2: . Then by definition and so which means that . Finally, let be a bijection and formally replace each by .
(4) (1):
This is immediate since each DFA is an IDFA.
∎
Theorem 2.9**.**
Let . Let be an IDFA accepting . Let be the number of states of . Suppose that all states of are reachable and that any two states of are -distinguishable. Then .
Proof.
Let be the automaton in Theorem 2.8 for and let be its set of states. We claim that is minimal. Note that . Let be any automaton accepting , let be its set of states and its transition function. Since accepts , for all , if then . Thus is injective, and we have established that , and hence that is minimal.
Now let be any IDFA accepting for which any two states are reachable and -distinguishable. It suffices to show that , and for this it suffices to give an injective map . For each we let
[TABLE]
Such an must exist, or else is not reachable.
Claim: is well-defined by (1).
Proof of claim.
Suppose that and let us show . Let . Since accepts ,
- •
for , iff and iff ; and
- •
for , iff is undefined or is not in , and iff is undefined or is not in
We have
[TABLE]
in the sense that and are both definitionally equal to , which may or may not be defined or in . So in all cases . ∎
Finally, let us show that is one-to-one. If then where . We will show, using -distinguishability, that .
Suppose . Then there is some with
[TABLE]
and Hence since accepts , , which contradicts . ∎
We write for the set of all functions from to .
Definition 2.10**.**
Let and be positive integers and let be the set of -ary functions . Let . The Champarnaud–Pin family of is the family of sets , where , , given by
[TABLE]
In terms of the function , this can be restated as
[TABLE]
So , is obtained from by plugging in constants for the first input, and so forth. We write \mathfrak{C}_{n}^{-}=\{f\in\mathfrak{C}_{n}:f(x)>0\text{ for some x}\}. Note that .
Definition 2.11**.**
Let us say that an IDFA accepts if accepts the function with if , and otherwise. The state complexity of is the minimum number of states of an IDFA accepting , and is denoted .
Note that Definition 2.11 says that . For , corresponds to automatic complexity of equivalence relations on binary strings as studied in [3]. The case is that of -ary operations on a given finite set, which is of interest in universal algebra.
We also define , which shall turn out to be the maximum of over all .
Definition 2.12**.**
We define a crossover function .
Definition 2.13**.**
Let and . We define an IDFA . Its set of states is the disjoint union
[TABLE]
where all , are distinct. The transition function of is given by
[TABLE]
Theorem 2.14**.**
Let and be positive integers. Let . Then
Proof.
Let . We must show that there is an IDFA accepting with at most the given number of states. Let and let (Definition 2.13). Then for and for . Note that for each there is an integer such that for some and for some . The transition function is given by Definition 2.13 and also described in Figure 1. Note that if , we may not have , but this is ruled out because then no can be onto (Definition 2.12). (We may assume that is onto, since otherwise a smaller IDFA can be found.)
Since , we have
[TABLE]
although this need not be strict (for instance, when , we are comparing the range of to the union of ranges of , , which may both equal ). By construction, accepts ; see also Example 2.15, Example 2.16, and Example 2.17. ∎
Example 2.15**.**
The following example shows the case and , with the majority function. It has :
[TABLE]
The states for serve as our final states and are indicated by a rectangular box. Here is the constant 1 function of variables, whereas is defined by if , 0 otherwise. There is no arrow labeled 0 between the states and . This is because after seeing we already know the majority of is 0, so we “reject by missing transition”.
Example 2.16**.**
A slightly larger example: the case and , with the majority function. It has :
[TABLE]
In this case, the upper bound is strict: and are equivalent. Thus a smaller automaton suffices:
[TABLE]
Example 2.17**.**
As an example for the case , let , , , and let . Then our automaton is:
[TABLE]
Theorem 2.18 is a generalization of Câmpeanu and Ho’s theorem. The construction is similar to that of [1, Figure 1 and Theorem 8].
Theorem 2.18**.**
Let and be integers. There exists a function such that
Proof.
Let To define , we first note that it suffices to fix an with and define for each . To that end, we fix . Since
[TABLE]
there exists a surjective function . Define by for each . We claim that attains the bound, i.e., there is no smaller automaton than that given in Theorem 2.14. By Theorem 2.9, an IDFA to accept is minimal if all states are reachable (from the start state) and any two states are -distinguishable.
Thus, it remains to show that the states for as given in the proof of Theorem 2.14 are reachable and -distinguishable.
By choice of it is easy to see that each state is reachable. For an example of what can go wrong with a different choice of , see Figure 2.
As for distinguishability, all states have a path to an accepting state, so it suffices to show that states that are the same distance from the start state are -distinguishable. Recall that the set of states of is
[TABLE]
For two states , where , it suffices to consider the case . Then and are -distinguishable precisely because we chose and so that each extension by adding one more symbol to , does not give the same set of possible extensions, i.e., precisely to distinguish and . Similarly and for have the sets of possible extensions given by and therefore are -distinguishable. ∎
3 The number of maximally complex languages
A -set is a set of cardinality . For a function we denote the range and domain by and , respectively. The collection of all subsets of of cardinality is denoted .
Lemma 3.1**.**
Let be positive integers with . Let be the constant function defined by for all . The number of -sets such that is
[TABLE]
Proof.
There are elements of and hence total -sets.
Since , . Thus the range of is disjoint from .
Given , , it follows that and so there are functions in whose range is disjoint from , i.e.,
[TABLE]
Here .
For the union of ranges to not contain means that there is some that is missed. The number of -sets that miss some is then given by inclusion-exclusion in terms of , the cardinality of a set that is disjoint from . Thus the number of -sets with is
[TABLE]
For fixed and , let () be the set of all (not constant zero) functions from to .
Definition 3.2**.**
For a function and , define a function by for all .
Note that is the function in the proof of Theorem 2.18.
Definition 3.3**.**
For each , let be the constant zero function. A set is -adequate if
[TABLE]
A function is called -adequate if its range is a -adequate -set, i.e.:
for each , 2. 2.
is injective, and 3. 3.
We say that is adequate if it is -adequate for .
Proposition 3.4**.**
If is -adequate then and .
The proof of Proposition 3.4 is immediate. It follows that can only be -adequate if , unless we happen to have .
Proposition 3.5**.**
For all , we have .
Proof.
is immediate. Conversely, suppose . Fix and write , . Then
[TABLE]
Definition 3.6**.**
For each and we defined the associated automaton in Definition 2.13. Let be with unreachable states removed and indistinguishable states merged. Let be the set of states of .
Theorem 3.7**.**
The following are equivalent:
* is adequate.* 2. 2.
; all states of are reachable and distinguishable; and accepts . 3. 3.
It is not the case that: and accepts .
Proof.
(2) (1): If is not adequate then by definition some states of are not reachable.
(1) (2): Theorem 2.18.
(2) (3) is immediate.
(3) (2): Assume (3). Since always accept , it follows that it has states. By Theorem 2.14 it has exactly states. ∎
Theorem 3.8**.**
The following are equivalent:
* is adequate.* 2. 2.
.
Proof.
(1) (2): by (1) (2) of Theorem 3.7 and then by Theorem 2.9.
(2) (1): Suppose (1). Then (1) in Theorem 3.7. Therefore (3) in Theorem 3.7, and so . ∎
Proposition 3.9**.**
Let be given, , , and . The number of adequate functions is
[TABLE]
Proof.
If is the number of adequate sets then the number of adequate functions is .
The map maps to functions whose union of ranges covers the next set of functions as in Lemma 3.1, -sets such that where .
Let be the constant zero function. Let for all . Let
[TABLE]
be an arbitrary bijection for which . By Lemma 3.1, applying , and with ,
[TABLE]
Thus, the number of maps is
[TABLE]
Theorem 3.10**.**
Let integers and be given. Let where . Let and . Then is given by (3) and equals
[TABLE]
Proof.
By Proposition 3.5,
[TABLE]
By Theorem 3.8 this equals , which by Proposition 3.9 equals (3). ∎
Example 3.11**.**
For and , we have , as illustrated in the following table:
[TABLE]
A maximal complexity function is determined by an injective function from to , such that . Associating each with the set , we see that the number of functions is times the number of four-element subsets of for which . By Lemma 3.1 that number is 1155: let , , , and and calculate that (2) is . Thus the total number of maximum complexity functions is .
4 Asymptotics
In this section we demonstrate (Theorem 4.3) that while most functions do not have maximum complexity, the growth rate of the number of maximally complex functions is similar to that of the total number of function for .
Proposition 4.1**.**
Suppose and are positive integers, and is a set. Suppose . Then we have
[TABLE]
Suppose that additionally , and is a constant function with for all . Then (4) also equals
[TABLE]
Proof.
(4)=(5): Let be given and let . It suffices to show that . Since
[TABLE]
for each , we have
[TABLE]
If then , and we have the contradiction
[TABLE]
(4)=(6): When is constant equal to a value not in , follows from the other condition: if then let . Then and we get a contradiction as in (7). ∎
Definition 4.2**.**
Let and be positive integers and let . Let be the number of functions from to that are onto :
[TABLE]
Theorem 4.3**.**
Let and be positive integers and let . Let . If the condition
[TABLE]
holds, then , where is minimal such that .
Proof.
By Theorem 3.8,
[TABLE]
Let be the constant zero function. Let for all . Let
[TABLE]
be an arbitrary bijection for which .
Given define by . The following are equivalent:
- •
;
- •
is onto .
Thus is equal to (4), where . By Proposition 4.1 under the bijection , with , , , and , is moreover equal to (6), as desired. ∎
Remark 4.4**.**
The authors regret that in [4], the condition (8) in Theorem 4.3 was erroneously omitted. By definition , so , but the condition fails when .
Example 4.5**.**
Consider the case , of Theorem 4.3. Then , where is the least such that . is the number of functions from to that are onto . For , there are no such functions. For , there are three such functions. And indeed, this is the number of maximal complexity functions in this case: the functions that are onto .
Definition 4.6**.**
Let be the number of onto functions from to . Stirling numbers of the second kind are denoted and equal the number of equivalence relations on with equivalence classes.
The following result is well known.
Lemma 4.7**.**
Let be positive integers. Then
Lemma 4.8**.**
Let and be positive integers. The number of functions from to that are onto the first elements of is
[TABLE]
The number of functions from to that are onto is
[TABLE]
Proof.
Let be the number of elements going to . Then we see that the number of such functions is
[TABLE]
by Lemma 4.7. ∎
Lemma 4.9**.**
Let be a positive integer. The number of functions from to that are onto is .
Proof.
Note that for any , and . By Lemma 4.8, the number of such functions is
[TABLE]
The following Lemma 4.10 will only be applied in the case .
Lemma 4.10**.**
Let be a nonnegative integer, let , and let be an integer. Let and , . , where is minimal such that , equals
[TABLE]
Proof.
The condition that for some , i.e., for some , i.e., , i.e., , i.e., either (i.e., ) or , follows from .
By Lemma 4.8, with ,
[TABLE]
The condition is equivalent to . When and , this is equivalent to
[TABLE]
Let . Since by assumption , (9) becomes
[TABLE]
Since the map is increasing, the requirement for is that . Note that setting now makes . Therefore by Lemma 4.9, is as desired. ∎
Lemma 4.11**.**
Let be a nonnegative integer. Let and . Then
[TABLE]
Proof.
We have and hence for , so that but . Thus Theorem 4.3 applies and the number of such functions is , where is minimal such that . By Lemma 4.10 with we are done. ∎
Using Theorem 3.10 for we calculate some values for
[TABLE]
the number of maximally complex functions from to , in Table 1. In Theorem 4.12 we shall study the limiting behavior suggested by Table 1.
Theorem 4.12**.**
The number of maximal complexity functions satisfies
[TABLE]
Proof.
It is immediate that . For the other direction, consider the case where for some . By Stirling’s approximation,
[TABLE]
and hence . By Lemma 4.11,
[TABLE]
In Lemma 4.11, may seem like a large number but it is relatively small: in terms of ,
[TABLE]
Example 4.13**.**
For and , then, we get , and
[TABLE]
So there are more than 177 trillion maximum-complexity 6-ary Boolean functions, which is however a small fraction of the total number of such functions,
[TABLE]
or over 18 quintillion.
Remark 4.14**.**
For future work, it would be interesting (but difficult) to determine the distribution of over .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Cezar Câmpeanu and Wing Hong Ho. The maximum state complexity for finite languages. J. Autom. Lang. Comb. , 9(2-3):189–202, 2004.
- 2[2] J.-M. Champarnaud and J.-E. Pin. A maxmin problem on finite automata. Discrete Applied Mathematics , 23(1):91 – 96, 1989.
- 3[3] Bjørn Kjos-Hanssen. On the complexity of automatic complexity. Theory Comput. Syst. , 61(4):1427–1439, 2017.
- 4[4] Bjørn Kjos-Hanssen and Lei Liu. The number of languages with maximum state complexity. In Theory and applications of models of computation , volume 11436 of Lecture Notes in Comput. Sci. , pages 394–409. Springer, Cham, 2019.
- 5[5] S. V. Poluyan and N. M. Ershov. Quantile transform in structural bioinformatics problems. Computational nanotechnology , (4):29–43, 2019.
- 6[6] Jeffrey Shallit. A Second Course in Formal Languages and Automata Theory . Cambridge University Press, New York, NY, USA, 1 edition, 2008.
