Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting

Su-Hyeon Kim; Hyunjoon Cheon; Yo-Sub Han; Sang-Ki Ko

arXiv:2205.11258·cs.LG·May 24, 2022

Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting

Su-Hyeon Kim, Hyunjoon Cheon, Yo-Sub Han, Sang-Ki Ko

PDF

Open Access

TL;DR

This paper introduces SplitRegex, a neural-guided divide-and-conquer framework for faster and more accurate regex synthesis from string examples, outperforming previous methods on benchmark datasets.

Contribution

The paper proposes a novel neural example splitting approach and a regex synthesis framework that improves speed and accuracy over existing methods.

Findings

01

Significant improvement over previous regex synthesis methods.

02

Effective division of positive strings enhances learning accuracy.

03

Framework successfully handles negative string constraints.

Abstract

Due to the practical importance of regular expressions (regexes, for short), there has been a lot of research to automatically generate regexes from positive and negative string examples. We tackle the problem of learning regexes faster from positive and negative strings by relying on a novel approach called `neural example splitting'. Our approach essentially split up each example string into multiple parts using a neural network trained to group similar substrings from positive strings. This helps to learn a regex faster and, thus, more accurately since we now learn from several short-length strings. We propose an effective regex synthesis framework called `SplitRegex' that synthesizes subregexes from `split' positive substrings and produces the final regex by concatenating the synthesized subregexes. For the negative sample, we exploit pre-generated subregexes during the subregex…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques