Humboldt: Metadata-Driven Extensible Data Discovery

Alex B\"auerle; \c{C}a\u{g}atay Demiralp; Michael Stonebraker

arXiv:2408.05439·cs.DB·August 22, 2024

Humboldt: Metadata-Driven Extensible Data Discovery

Alex B\"auerle, \c{C}a\u{g}atay Demiralp, Michael Stonebraker

PDF

Open Access

TL;DR

Humboldt is a framework that enhances data discovery by leveraging metadata, enabling scalable, extensible, and automatically generated user interfaces for data search and visualization.

Contribution

It introduces a metadata-driven approach that decouples metadata sources from UI implementation, allowing automatic generation of interactive data discovery interfaces.

Findings

01

Reduces UI development effort for metadata updates

02

Supports complex metadata-based queries and visualizations

03

Enables rapid evolution of data discovery interfaces

Abstract

Data discovery is crucial for data management and analysis and can benefit from better utilization of metadata. For example, users may want to search data using queries like ``find the tables created by Alex and endorsed by Mike that contain sales numbers.'' They may also want to see how the data they view relates to other data, its lineage, or the quality and compliance of its upstream datasets, all metadata. Yet, effectively surfacing metadata through interactive user interfaces (UIs) to augment data discovery poses challenges. Constantly revamping UIs with each update to metadata sources (or providers) consumes significant development resources and lacks scalability and extensibility. In response, we introduce Humboldt, a new framework enabling interactive data systems to effectively leverage metadata for data discovery and rapidly evolve their UIs to support metadata changes.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Data Mining Algorithms and Applications