# Learning Restricted Regular Expressions with Interleaving

**Authors:** Chunmei Dong, Yeting Li, Haiming Chen

arXiv: 1904.13164 · 2019-05-01

## TL;DR

This paper introduces a new subclass of regular expressions with interleaving for Relax NG schemas, along with a polynomial inference algorithm, demonstrating improved practicality and precision through extensive experiments.

## Contribution

It proposes a novel subclass of regular expressions with interleaving and a polynomial inference algorithm for Relax NG schemas.

## Key findings

- The new subclass outperforms previous ones in practicality.
- Inferred regular expressions are more precise.
- Experimental results on large-scale data validate effectiveness.

## Abstract

The advantages for the presence of an XML schema for XML documents are numerous. However, many XML documents in practice are not accompanied by a schema or by a valid schema. Relax NG is a popular and powerful schema language, which supports the unconstrained interleaving operator. Focusing on the inference of Relax NG, we propose a new subclass of regular expressions with interleaving and design a polynomial inference algorithm. Then we conducted a series of experiments based on large-scale real data and on three XML data corpora, and experimental results show that our subclass has a better practicality than previous ones, and the regular expressions inferred by our algorithm are more precise.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.13164/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1904.13164/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1904.13164/full.md

---
Source: https://tomesphere.com/paper/1904.13164