On the Expressive Power of Regular Expressions with Backreferences
Taisei Nogami, Tachio Terauchi

TL;DR
This paper explores the expressive power of regular expressions with backreferences, showing they generate exactly indexed languages, and clarifies their position within formal language classes, including stack and nonerasing stack languages.
Contribution
It establishes that rewbs are contained within indexed languages, improves the known upper-bound from context-sensitive languages, and clarifies their hierarchy within formal language classes.
Findings
Rewbs generate exactly indexed languages.
Some rewbs define languages outside stack languages.
Rewbs without captured references are in nonerasing stack languages.
Abstract
A rewb is a regular expression extended with a feature called backreference. It is broadly known that backreference is a practical extension of regular expressions, and is supported by most modern regular expression engines, such as those in the standard libraries of Java, Python, and more. Meanwhile, indexed languages are the languages generated by indexed grammars, a formal grammar class proposed by AVAho. We show that these two models' expressive powers are related in the following way: every language described by a rewb is an indexed language. As the smallest formal grammar class previously known to contain rewbs is the class of context sensitive languages, our result strictly improves the known upper-bound. Moreover, we prove the following two claims: there exists a rewb whose language does not belong to the class of stack languages, which is a proper subclass of indexed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
