Schemaless Queries over Document Tables with Dependencies
Mustafa Canim, Cristina Cornelio, Arun Iyengar, Ryan Musa, Mariano, Rodrigez Muro

TL;DR
This paper presents a novel approach to querying unstructured enterprise tables embedded in documents using semantic technologies, enabling efficient, schema-less data retrieval and integration with minimal manual effort.
Contribution
It introduces a method leveraging RDF/SPARQL and database dependencies to perform schema-less queries over non-relational tables in documents, reducing manual data integration efforts.
Findings
Enables querying of embedded tables with minimal manual mapping.
Supports complex structured queries involving multiple tables.
Reduces costs associated with traditional table extraction and schema mapping.
Abstract
Unstructured enterprise data such as reports, manuals and guidelines often contain tables. The traditional way of integrating data from these tables is through a two-step process of table detection/extraction and mapping the table layouts to an appropriate schema. This can be an expensive process. In this paper we show that by using semantic technologies (RDF/SPARQL and database dependencies) paired with a simple but powerful way to transform tables with non-relational layouts, it is possible to offer query answering services over these tables with minimal manual work or domain-specific mappings. Our method enables users to exploit data in tables embedded in documents with little effort, not only for simple retrieval queries, but also for structured queries that require joining multiple interrelated tables.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Web Data Mining and Analysis
