Data Scaling in OBDA Benchmarks: The VIG Approach
Davide Lanti, Guohui Xiao, Diego Calvanese

TL;DR
VIG is a fast, domain-aware data scaler for OBDA benchmarks that efficiently generates large, realistic datasets by leveraging ontology and mapping information, simplifying benchmark data preparation.
Contribution
It introduces VIG, a novel data scaling approach that incorporates domain knowledge for efficient, scalable, and general data generation in OBDA benchmarking.
Findings
VIG efficiently scales data with constant time value generation.
It preserves application-specific characteristics during scaling.
VIG is applicable to various benchmarks beyond NPD.
Abstract
In this paper we describe VIG, a data scaler for benchmarks in the context of ontology-based data access (OBDA). Data scaling is a relatively recent approach, proposed in the database community, that allows for quickly scaling up an input data instance to s times its size, while preserving certain application-specific characteristics. The advantage of the approach is that the user is not required to manually input the characteristics of the data to be produced, making it particularly suitable for OBDA benchmarks, where the complexity of database schemas might pose a challenge for manual input (e.g., the NPD benchmark contains 70 tables with some containing more than 60 columns). As opposed to a traditional data scaler, VIG includes domain information provided by the OBDA mappings and the ontology in order to produce data. VIG is currently used in the NPD benchmark, but it is not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Database Systems and Queries · Service-Oriented Architecture and Web Services
