Correct and Optimal: the Regular Expression Inference Challenge

Mojtaba Valizadeh; Philip John Gorinski; Ignacio Iacobacci; Martin; Berger

arXiv:2308.07899·cs.LG·May 13, 2024·1 cites

Correct and Optimal: the Regular Expression Inference Challenge

Mojtaba Valizadeh, Philip John Gorinski, Ignacio Iacobacci, Martin, Berger

PDF

Open Access

TL;DR

This paper introduces the Regular Expression Inference (REI) challenge, a well-defined problem for machine learning that involves inferring minimal regular expressions from examples, with the aim to advance code and language modeling.

Contribution

The paper formalizes REI as a challenge problem, provides the first large-scale datasets, and evaluates initial heuristics and ML baselines, encouraging community participation.

Findings

01

GPU-based REI solver enables fast minimal regex generation

02

Large-scale REI datasets are now available for research

03

Initial heuristics and ML baselines show promising results

Abstract

We propose regular expression inference (REI) as a challenge for code/language modelling, and the wider machine learning community. REI is a supervised machine learning (ML) and program optimisation task, and poses the problem of finding minimal regular expressions from examples: Given two finite sets of strings $P$ and $N$ and a cost function $cos t (\cdot)$ , the task is to generate an expression $r$ that accepts all strings in $P$ and rejects all strings in $N$ , while no other such expression $r^{'}$ exists with $cos t (r^{'}) < cos t (r)$ . REI has advantages as a challenge problem: (i) regular expressions are well-known, widely used, and a natural idealisation of code; (ii) REI's asymptotic worst-case complexity is well understood; (iii) REI has a small number of easy to understand parameters (e.g. $P$ or $N$ cardinality, string lengths of examples, or the cost function); this lets us easily…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Testing and Debugging Techniques