Enumerating regular expressions and their languages
Hermann Gruber, Jonathan Lee, Jeffrey Shallit

TL;DR
This paper explores methods to enumerate regular expressions and their languages by size, using algebraic and analytic techniques to derive asymptotic estimates and exact counts for small cases.
Contribution
It introduces a formal grammar-based approach and applies complex analysis to estimate the number of regular expressions and languages of a given size.
Findings
Asymptotic estimates for the number of regular expressions of size n
Asymptotic estimates for the number of languages represented by these expressions
Exact enumeration results for small sizes
Abstract
In this chapter we discuss the problem of enumerating distinct regular expressions by size and the regular languages they represent. We discuss various notions of the size of a regular expression that appear in the literature and their advantages and disadvantages. We consider a formal definition of regular expressions using a context-free grammar. We then show how to enumerate strings generated by an unambiguous context-free grammar using the Chomsky-Sch\"utzenberger theorem. This theorem allows one to construct an algebraic equation whose power series expansion provides the enumeration. Classical tools from complex analysis, such as singularity analysis, can then be used to determine the asymptotic behavior of the enumeration. We use these algebraic and analytic methods to obtain asymptotic estimates on the number of regular expressions of size n. A single regular language can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
