Non-hierarchical Structures: How to Model and Index Overlaps?
Faegheh Hasibi, Svein Erik Bratsberg

TL;DR
This paper introduces TGSA, a novel data model and indexing method for efficiently representing and querying overlapping, non-hierarchical structures in digital objects, overcoming limitations of traditional tree-based approaches.
Contribution
The paper presents TGSA, an extension of XML for non-hierarchical structures, along with an algorithm and indexing method for efficient processing of overlaps.
Findings
Efficient construction algorithm for TGSA from annotated documents
Formal proofs validating the transformation process
Extended XML pre-post indexing supporting reachability and overlaps
Abstract
Overlap is a common phenomenon seen when structural components of a digital object are neither disjoint nor nested inside each other. Overlapping components resist reduction to a structural hierarchy, and tree-based indexing and query processing techniques cannot be used for them. Our solution to this data modeling problem is TGSA (Tree-like Graph for Structural Annotations), a novel extension of the XML data model for non-hierarchical structures. We introduce an algorithm for constructing TGSA from annotated documents; the algorithm can efficiently process non-hierarchical structures and is associated with formal proofs, ensuring that transformation of the document to the data model is valid. To enable high performance query analysis in large data repositories, we further introduce an extension of XML pre-post indexing for non-hierarchical structures, which can process both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Data Management and Algorithms
