Deterministic Regular Expressions With Back-References
Dominik D. Freydenberger, Markus L. Schmid

TL;DR
This paper introduces a class of deterministic regular expressions with back-references, balancing expressive power and computational tractability, and providing automaton models and algorithms for their analysis.
Contribution
It defines a new deterministic regular expression class with back-references, including automaton models and a generalized Glushkov construction, improving analysis efficiency.
Findings
Efficient membership testing for the new class.
Decidability of certain static analysis problems.
Expressive power surpasses deterministic regex without back-references.
Abstract
Most modern libraries for regular expression matching allow back-references (i.e., repetition operators) that substantially increase expressive power, but also lead to intractability. In order to find a better balance between expressiveness and tractability, we combine these with the notion of determinism for regular expressions used in XML DTDs and XML Schema. This includes the definition of a suitable automaton model, and a generalization of the Glushkov construction. We demonstrate that, compared to their non-deterministic superclass, these deterministic regular expressions with back-references have desirable algorithmic properties (i.e., efficiently solvable membership problem and some decidable problems in static analysis), while, at the same time, their expressive power exceeds that of deterministic regular expressions without back-references.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
