Verified invertible lexer using regular expressions and DFAs

Samuel Chassot; Viktor Kun\v{c}ak

arXiv:2412.13581·cs.PL·December 19, 2024

Verified invertible lexer using regular expressions and DFAs

Samuel Chassot, Viktor Kun\v{c}ak

PDF

Open Access 1 Repo

TL;DR

This paper introduces a verified invertible lexer framework in Scala, utilizing regular expressions and DFAs, ensuring correctness in tokenization and serialization processes through formal proofs.

Contribution

It presents a novel verified lexer framework with formal invertibility proofs, implemented in Scala and verified using the Stainless framework, with two matching approaches.

Findings

01

Verified lexer framework developed in Scala

02

Formal proofs of invertibility using Stainless

03

Implementation includes regex and DFA matchers

Abstract

In this project, we explore the concept of invertibility applied to serialisation and lexing frameworks. Recall that, on one hand, serialisation is the process of taking a data structure and writing it to a bit array while parsing is the reverse operation, i.e., reading the bit array and constructing the data structure back. While lexing, on the other hand, is the process of reading a stream of characters and splitting them into tokens, by following a list of given rules. While used in different applications, both are similar in their abstract operation: they both take a list of simple characters and extract a more complex structure. Applications in which these two operations are used are different but they share a need for the invertibility of the process. For example, when tokenising a code file that was prettyprinted by a compiler, one would expect to get the same sequence of tokens.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samuelchassot/verifiedlexer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFuzzy Logic and Control Systems · Logic, programming, and type systems · Natural Language Processing Techniques