Enumerating regular expressions and their languages

Hermann Gruber; Jonathan Lee; Jeffrey Shallit

arXiv:1204.4982·cs.FL·April 24, 2012·Handbook of Automata Theory

Enumerating regular expressions and their languages

Hermann Gruber, Jonathan Lee, Jeffrey Shallit

PDF

TL;DR

This paper explores methods to enumerate regular expressions and their languages by size, using algebraic and analytic techniques to derive asymptotic estimates and exact counts for small cases.

Contribution

It introduces a formal grammar-based approach and applies complex analysis to estimate the number of regular expressions and languages of a given size.

Findings

01

Asymptotic estimates for the number of regular expressions of size n

02

Asymptotic estimates for the number of languages represented by these expressions

03

Exact enumeration results for small sizes

Abstract

In this chapter we discuss the problem of enumerating distinct regular expressions by size and the regular languages they represent. We discuss various notions of the size of a regular expression that appear in the literature and their advantages and disadvantages. We consider a formal definition of regular expressions using a context-free grammar. We then show how to enumerate strings generated by an unambiguous context-free grammar using the Chomsky-Sch\"utzenberger theorem. This theorem allows one to construct an algebraic equation whose power series expansion provides the enumeration. Classical tools from complex analysis, such as singularity analysis, can then be used to determine the asymptotic behavior of the enumeration. We use these algebraic and analytic methods to obtain asymptotic estimates on the number of regular expressions of size n. A single regular language can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.