Idris TyRE: a dependently typed regex parser
Ohad Kammar, Katarzyna Marek

TL;DR
This paper introduces Idris TyRE, a dependently typed regex parser that maintains type safety across all layers, reducing redundancy and increasing reliability in regex-based parsing.
Contribution
It extends Radanne's typed regexes by implementing a fully type-safe parser in Idris 2, covering user input to parse-tree construction.
Findings
Type-safe regex parsing in Idris 2
Maintains correctness across all parsing layers
Reduces redundancy in regex parsing workflows
Abstract
Regular expressions -- regexes -- are widely used not only for validating, but also for parsing textual data. Generally, regex parsers output a loose structure, e.g. an unstructured list of matches, leaving it up to the user to validate the output's properties and transform it into the desired structure. Since the regex itself carries information about the structure, this design leads to unnecessary repetition. Radanne introduced typed regexes -- TyRE -- a type-indexed combinator layer that can be added on top of an existing regex engine. We extend Radanne's design, and implement a parser which maintains type-safety throughout all layers: the user-facing regexes; their internal, desugared, representation; its compiled finite-state automaton; and the automaton's associated instruction-set for constructing the parse-trees. We implemented TyRE in the dependently-typed language Idris 2.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Logic, programming, and type systems · Formal Methods in Verification
