Humboldt: Metadata-Driven Extensible Data Discovery
Alex B\"auerle, \c{C}a\u{g}atay Demiralp, Michael Stonebraker

TL;DR
Humboldt is a framework that enhances data discovery by leveraging metadata, enabling scalable, extensible, and automatically generated user interfaces for data search and visualization.
Contribution
It introduces a metadata-driven approach that decouples metadata sources from UI implementation, allowing automatic generation of interactive data discovery interfaces.
Findings
Reduces UI development effort for metadata updates
Supports complex metadata-based queries and visualizations
Enables rapid evolution of data discovery interfaces
Abstract
Data discovery is crucial for data management and analysis and can benefit from better utilization of metadata. For example, users may want to search data using queries like ``find the tables created by Alex and endorsed by Mike that contain sales numbers.'' They may also want to see how the data they view relates to other data, its lineage, or the quality and compliance of its upstream datasets, all metadata. Yet, effectively surfacing metadata through interactive user interfaces (UIs) to augment data discovery poses challenges. Constantly revamping UIs with each update to metadata sources (or providers) consumes significant development resources and lacks scalability and extensibility. In response, we introduce Humboldt, a new framework enabling interactive data systems to effectively leverage metadata for data discovery and rapidly evolve their UIs to support metadata changes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Semantic Web and Ontologies · Data Mining Algorithms and Applications
