Correct and Optimal: the Regular Expression Inference Challenge
Mojtaba Valizadeh, Philip John Gorinski, Ignacio Iacobacci, Martin, Berger

TL;DR
This paper introduces the Regular Expression Inference (REI) challenge, a well-defined problem for machine learning that involves inferring minimal regular expressions from examples, with the aim to advance code and language modeling.
Contribution
The paper formalizes REI as a challenge problem, provides the first large-scale datasets, and evaluates initial heuristics and ML baselines, encouraging community participation.
Findings
GPU-based REI solver enables fast minimal regex generation
Large-scale REI datasets are now available for research
Initial heuristics and ML baselines show promising results
Abstract
We propose regular expression inference (REI) as a challenge for code/language modelling, and the wider machine learning community. REI is a supervised machine learning (ML) and program optimisation task, and poses the problem of finding minimal regular expressions from examples: Given two finite sets of strings and and a cost function , the task is to generate an expression that accepts all strings in and rejects all strings in , while no other such expression exists with . REI has advantages as a challenge problem: (i) regular expressions are well-known, widely used, and a natural idealisation of code; (ii) REI's asymptotic worst-case complexity is well understood; (iii) REI has a small number of easy to understand parameters (e.g. or cardinality, string lengths of examples, or the cost function); this lets us easily…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Testing and Debugging Techniques
