Data Language Specification via Terminal Attribution

Alexander Sakharov; Timothy Sakharov

arXiv:1511.00909·cs.FL·November 4, 2015

Data Language Specification via Terminal Attribution

Alexander Sakharov, Timothy Sakharov

PDF

Open Access

TL;DR

This paper introduces a simplified notation for defining LL(1) data language grammars by classifying terminals into layered groups, easing the development of data parsers.

Contribution

It proposes a new notation for data language grammars that simplifies parser development by classifying terminals into layered groups.

Findings

01

Simplifies grammar definition for data languages

02

Facilitates easier parser development

03

Reduces complexity of grammar debugging

Abstract

Unstructured data have to be parsed in order to become usable. The complexity of grammar notations and the difficulty of grammar debugging limit the use of parsers for data preprocessing. We introduce a notation in which grammars are defined by simply dividing terminals into predefined classes and then splitting elements of some classes into multiple layered sub-groups. These LL(1) grammars are designed for data languages. They simplify the task of developing data parsers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Web Data Mining and Analysis