Regularity of languages generated by non context-free grammars over a singleton terminal alphabet
Alberto Pettorossi, Maurizio Proietti

TL;DR
This paper proves that any language over a singleton alphabet satisfying the Pumping Lemma is regular, extending known results about the regularity of certain non-context-free languages.
Contribution
It demonstrates that all languages in the Pumping Lemma superclass over a singleton alphabet are regular, without using Parikh's Theorem, broadening previous understanding.
Findings
Languages satisfying the Pumping Lemma over a singleton alphabet are regular
The proof is based on a transformational approach, not Parikh's Theorem
Extends known results to languages not necessarily context-free or satisfying Parikh's Theorem
Abstract
It is well-known that: (i) every context-free language over a singleton terminal alphabet is regular, and (ii) the class of languages that satisfy the Pumping Lemma is a proper super-class of the context-free languages. We show that any language in this superclass over a singleton terminal alphabet is regular. Our proof is based on a transformational approach and does not rely on Parikh's Theorem. Our result extends previously known results because there are languages that are not context-free, do satisfy the Pumping Lemma, and do not satisfy the hypotheses of Parikh's Theorem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicssemigroups and automata theory · DNA and Biological Computing · Algorithms and Data Compression
Regularity of languages generated by non context-free grammars over a singleton terminal alphabet
Alberto Pettorossi
Maurizio Proietti
DICII, University of Rome Tor Vergata, Rome, Italy
CNR-IASI, Rome, Italy
Abstract
It is well-known that: (i) every context-free language over a singleton terminal alphabet is regular [5], and (ii) the class of languages that satisfy the Pumping Lemma (for context-free languages) is a proper super-class of the context-free languages. We show that any language in this super-class over a singleton terminal alphabet is regular. Our proof is based on an elementary transformational approach and does not rely on Parikh’s Theorem [7]. Our result extends previously known results because there are languages that are not context-free, do satisfy the Pumping Lemma, and do not satisfy the hypotheses of Parikh’s Theorem [8].
keywords:
Context-free languages, pumping lemma (for context-free languages), Parikh’s Theorem, regular languages.
††journal: Information Processing Letters
Let us begin by introducing our terminology and notations.
The set of the natural numbers is denoted by . The set of the -tuples of natural numbers is denoted by . We say that a language is over the terminal alphabet iff . Given a word , is the empty word , and, for any , is , that is, the concatenation of and . The length of a word is denoted by . Given a symbol , the number of occurrences of in is denoted by . The cardinality of a set is denoted by .
Given an alphabet such that , the concatenation of any two words in is commutative, that is, .
In Theorem 2 below we extend the well known result stating that any context-free language over a singleton terminal alphabet is a regular language [5]. An early proof of this result appears in a paper by Ginsburg and Rice [4]. That proof is based on Tarski’s fixpoint theorem and it is not based on the Pumping Lemma (contrary to what has been stated in a paper by Andrei et al. [2]). Our extension is due to the facts that: (i) our proof does not rely on Parikh’s Theorem [7], like the proof in Harrison’s book [5], and (ii) there are non context-free languages that do satisfy the Pumping Lemma (see Definition 1) and do not satisfy Parikh’s Condition (see Definition 2) (and thus Parikh’s Theorem cannot be applied) [8]. Our proof is very much related to one presented in a book by Shallit [9], but we believe that ours is more elementary.
Definition 1** (Pumping Lemma [3])**
We* *say that a language satisfies the Pumping Lemma (for context-free languages) iff the following property, denoted , holds:
, , if , then , such that
(1) ,
(2) ,
(3) , and
(4) , .
- *
Definition 2** (Parikh’s Condition [7])**
(i) A subset of is said to be a linear set iff there exist such that , where, for any given and in , denotes and, for any , denotes . (ii) Given the alphabet , we say that a language satisfies Parikh’s Condition iff is a finite union of linear subsets of .
Let us first state and prove the following lemma whose proof is by transformation from Definition 1.
Lemma 1
For any language over a terminal alphabet such that , holds iff the following property, denoted , holds:
, , if , then , such that
,
,
, and
, if , then .**
Proof 1
If , then commutativity of concatenation implies that in we can replace by , and by . Then, we can replace: by , by , and by . Thus, from , we get: , , if , then , such that**
* *,**
* *,**
* *, and**
* *, .**
Now if we take the lengths of the words and we denote by , by , and by , we get:
, , if , then , such that
* *,**
* *,**
* *, and**
* * , if , then .**
For all , , and , we have that iff . Thus, we get .
We say that holds for if is a witness of the quantification ‘’ in . The following theorem states our main result.
Theorem 2
Let be any language over a terminal alphabet such that . If holds, then is a regular language.**
Proof 2
Without loss of generality, let us consider a language over the terminal alphabet , such that holds. By Lemma 1, we have that holds for some positive integer . Let us consider the following two disjoint languages whose union is :
(i) and (ii) .
Now, is a regular language, because it is finite.* Since regular languages are closed** under finite union and intersection [6], in order to prove that is regular, it is enough to prove, as we now do, that*
* *
where: (i) is a set of languages which is* a subset of the following finite set of languages ( are integers):*
* *
* are all distinct *
and (ii) for all , the language:
* *
is regular.
Indeed, (i) is regular, (ii) is a finite set of languages because, for any , there exists only a finite number of tuples satisfying all the conditions stated inside the set expression , and (iii) the language is regular because it is recognized by the following nondeterministic finite automaton with initial state and final state :
\VCDraw{VCPicture}
(0,-1.5)(10,2.5) \FixStateDiameter12mm \FixStateLineDouble0.41.3 \ChgStateLineWidth1.2 \SetEdgeArrowWidth6pt \SetEdgeArrowLengthCoef1.8
\State
A1 \FinalStateB2 \SetStateLineColorwhite \FixStateDiameter1mm \State.dot1 \State.dot2 \State.dot3
\Initial
1 \Edge12\LabelL*[0.47]a^ ph+q0+…+qk \LoopN2*\LabelL*[0.74] a^ q0 \LoopE2*\LabelL*[0.5]a^ qk*
In order to prove Equality it remains to prove that, for any , there exists a tuple of the form such* that .*
Given any word , the following algorithm constructs a tuple of the form , for some .
* Tuple Generation Algorithm*
**
**
while* do od;*
**
**
In this algorithm is a function from to , whose existence follows from the validity of , satisfying the following condition: for every , such that and (take in Condition (4.1) of in Lemma 1). The termination of the Tuple Generation Algorithm is a consequence of the fact that, for every , for every , and . This implies that is a* strictly decreasing sequence of integers, and eventually in that sequence we will get an element smaller than , and the while-loop terminates.*
Thus, for every , there exist such that:
* *
where:* and for every , if , then ( and ).*
In general, in Equality the ’s are not all distinct. Thus, by rearranging the summands, and writing , instead of with occurrences of , we have that, for every word , there exist some integers such that
, where:
( 0), **( 1), ( 2),**
( 3)* are all distinct, and *( 4).
From (* 2) and ( 3), we have that . Hence, Condition ( 0) can be strengthened to: *(. We also have that , and when in Equality the values of are all distinct.
Since Conditions ), , , and are those occurring in the set expressions , and Condition is the one occurring in the set expressions , we have concluded the proof of Equality and that of Theorem 2.
Let us make a few of remarks on the proof of Theorem 2.
(i) The validity of tells us that the function exists, but it does not tell us how to compute , for any given .
(ii) Since summation is commutative, it may be the case that a language in corresponds to more than one tuple . In particular, we have that , whenever is a permutation of .
(iii) If , then . Thus, from Conditions ( 1) and ( 3) we have: . We also have that is the singleton , where is the language .
(iv) In Equality the set of languages may be a proper subset of . Indeed, let us consider the language generated by the context-free grammar . Since holds for , we can take the constant occurring in Equality to be . If we consider the word , then the set of languages includes, among others, the languages , , and (these three languages are obtained for ). Now, , while and .
(v) It may be the case that the length of the word labeling the arc from state to state of the finite automaton depicted above, is smaller than . Thus, in the definition of the intersection of with ensures that only words whose length is at least are considered.
Acknowledgements
This work has been partially funded by INdAM-GNCS (Italy).
References
- [1]
- [2]
Ş. Andrei, S. V. Cavadini, W.-N. Chin, A new algorithm for regularizing one-letter context-free grammars, Theoretical Computer Science 306 (1-3) (2003) 113–122.
- [3]
Y. Bar-Hillel, M. Perles, E. Shamir, On formal properties of simple phrase structure grammars, Z. Phonetik. Sprachwiss. Kommunikationsforsch. 14 (1961), 143–172.
- [4]
S. Ginsburg, G. H. Rice, Two families of languages related to ALGOL, JACM 9 (3) (1962), 350–371.
- [5]
M. A. Harrison, Introduction to Formal Language Theory, Addison Wesley (1978).
- [6]
J. E. Hopcroft, J. D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley (1979).
- [7]
R. J. Parikh, On context-free languages, J. ACM 13 (4) (1966), 570–581.
- [8]
G. Ramos-Jiménez, J. López-Muñoz, R. Morales-Bueno, Comparisons of Parikh’s condition to other conditions for context-free languages, Theoretical Computer Science 202 (1) (1998), 231 – 244.
- [9]
J. Shallit, A Second Course in Formal Languages and Automata Theory, Cambridge University Press (2008).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 2[2] Ş. Andrei, S. V. Cavadini, W.-N. Chin, A new algorithm for regularizing one-letter context-free grammars, Theoretical Computer Science 306 (1-3) (2003) 113–122.
- 3[3] Y. Bar-Hillel, M. Perles, E. Shamir, On formal properties of simple phrase structure grammars, Z. Phonetik. Sprachwiss. Kommunikationsforsch. 14 (1961), 143–172.
- 4[4] S. Ginsburg, G. H. Rice, Two families of languages related to ALGOL, JACM 9 (3) (1962), 350–371.
- 5[5] M. A. Harrison, Introduction to Formal Language Theory, Addison Wesley (1978).
- 6[6] J. E. Hopcroft, J. D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley (1979).
- 7[7] R. J. Parikh, On context-free languages , J. ACM 13 (4) (1966), 570–581.
- 8[8] G. Ramos-Jiménez, J. López-Muñoz, R. Morales-Bueno, Comparisons of Parikh’s condition to other conditions for context-free languages, Theoretical Computer Science 202 (1) (1998), 231 – 244.
