LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs
Olivier Cur\'e, Hubert Naacke, Tendry Randriamalala, Bernd Amann

TL;DR
LiteMat introduces a scalable encoding scheme for RDF graphs that reduces memory use and speeds up inference processing by efficiently handling hierarchies and entailment rules, suitable for large datasets.
Contribution
The paper presents a novel, scalable encoding method for RDF data that minimizes materialization and query rewriting, implemented over Apache Spark for large-scale inference tasks.
Findings
Efficient encoding reduces memory footprint for large RDF datasets.
Scalable parallel algorithm for encoding over Apache Spark.
Evaluation shows improved performance on synthetic and real datasets.
Abstract
The number of linked data sources and the size of the linked open data graph keep growing every day. As a consequence, semantic RDF services are more and more confronted with various "big data" problems. Query processing in the presence of inferences is one them. For instance, to complete the answer set of SPARQL queries, RDF database systems evaluate semantic RDFS relationships (subPropertyOf, subClassOf) through time-consuming query rewriting algorithms or space-consuming data materialization solutions. To reduce the memory footprint and ease the exchange of large datasets, these systems generally apply a dictionary approach for compressing triple data sizes by replacing resource identifiers (IRIs), blank nodes and literals with integer values. In this article, we present a structured resource identification scheme using a clever encoding of concepts and property hierarchies for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Data Quality and Management
