A Noise-tolerant Differentiable Learning Approach for Single Occurrence Regular Expression with Interleaving
Rongzhen Ye, Tianqu Zhuang, Hai Wan, Jianfeng Du, Weilin Luo, Pingjia, Liang

TL;DR
This paper introduces SOIREDL, a neural network-based, noise-tolerant method for learning single occurrence regular expressions with interleaving, demonstrating superior performance especially with noisy data.
Contribution
The paper presents a novel neural network approach that is robust to noise and capable of learning complex SOIREs, with a theoretical proof of one-to-one correspondence between network parameters and SOIREs.
Findings
SOIREDL outperforms existing methods on noisy datasets.
Theoretical proof links neural network parameters to SOIREs.
Approach supports a broad class of practical regular expressions.
Abstract
We study the problem of learning a single occurrence regular expression with interleaving (SOIRE) from a set of text strings possibly with noise. SOIRE fully supports interleaving and covers a large portion of regular expressions used in practice. Learning SOIREs is challenging because it requires heavy computation and text strings usually contain noise in practice. Most of the previous studies only learn restricted SOIREs and are not robust on noisy data. To tackle these issues, we propose a noise-tolerant differentiable learning approach SOIREDL for SOIRE. We design a neural network to simulate SOIRE matching and theoretically prove that certain assignments of the set of parameters learnt by the neural network, called faithful encodings, are one-to-one corresponding to SOIREs for a bounded size. Based on this correspondence, we interpret the target SOIRE from an assignment of the set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Algorithms
