Stream Processing using Grammars and Regular Expressions
Ulrik Terp Rasmussen

TL;DR
This dissertation introduces efficient streaming algorithms for regex and grammar parsing, and presents Kleenex, a language for high-performance streaming string processing with linear-time compilation and parsing capabilities.
Contribution
It develops two linear-time regex parsing algorithms, introduces Kleenex for grammar-based streaming processing, and extends linear-time parsing to PEGs using novel algorithms.
Findings
Two linear-time regex parsing algorithms with different streaming modes.
Kleenex language for high-performance streaming string processing.
A new linear-time PEG parsing algorithm using fixed points.
Abstract
In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · semigroups and automata theory
