Discovering Restricted Regular Expressions with Interleaving

Feifei Peng; Haiming Chen

arXiv:1504.00150·cs.DB·April 2, 2015

Discovering Restricted Regular Expressions with Interleaving

Feifei Peng, Haiming Chen

PDF

Open Access

TL;DR

This paper addresses the challenge of learning minimal, unordered XML schemas with interleaving from example data, proposing new algorithms to approximate solutions for an NP-hard problem.

Contribution

It introduces a novel approximation algorithm and heuristic for inferring minimal interleaving schemas, which previous methods could not effectively handle.

Findings

01

Heuristic results are close to optimal

02

Algorithms work effectively on real-world datasets

03

Schema inference with interleaving is NP-hard

Abstract

Discovering a concise schema from given XML documents is an important problem in XML applications. In this paper, we focus on the problem of learning an unordered schema from a given set of XML examples, which is actually a problem of learning a restricted regular expression with interleaving using positive example strings. Schemas with interleaving could present meaningful knowledge that cannot be disclosed by previous inference techniques. Moreover, inference of the minimal schema with interleaving is challenging. The problem of finding a minimal schema with interleaving is shown to be NP-hard. Therefore, we develop an approximation algorithm and a heuristic solution to tackle the problem using techniques different from known inference algorithms. We do experiments on real-world data sets to demonstrate the effectiveness of our approaches. Our heuristic algorithm is shown to produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Data Mining Algorithms and Applications