Semantic Data Management in Data Lakes
Sayed Hoseini, Johannes Theissen-Lipp, Christoph Quix

TL;DR
This paper surveys recent semantic data management approaches in data lakes, emphasizing techniques that enhance data integration, interoperability, and scalability using knowledge graphs and ontology-based methods.
Contribution
It classifies and compares recent methods for semantic data management in data lakes, highlighting challenges and future directions for integrating Big Data and Semantic Web technologies.
Findings
Semantic layering improves data interoperability.
Ontology-based access enables expressive data queries.
Scalability remains a key challenge for semantic data management.
Abstract
In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Quality and Management · Service-Oriented Architecture and Web Services
MethodsFocus
