Analytics Modelling over Multiple Datasets using Vector Embeddings
Andreas Loizou, Dimitrios Tsoumakos

TL;DR
This paper introduces a deep learning-based vector embedding approach, NumTabData2Vec, to predict analytics outcomes from multiple datasets efficiently, improving accuracy and speed in data selection for analysis.
Contribution
The paper presents a novel deep learning model for transforming datasets into vector embeddings, enabling accurate and fast prediction of analytics outcomes across diverse real-world scenarios.
Findings
Predicts analytics outcomes accurately
Increases speedup over existing models
Effectively distinguishes different real-world scenarios
Abstract
The massive increase in the data volume and dataset availability for analysts compels researchers to focus on data content and select high-quality datasets to enhance the performance of analytics operators. While selecting high-quality data significantly boosts analytical accuracy and efficiency, the exact process is very challenging given large-scale dataset availability. To address this issue, we propose a novel methodology that infers the outcome of analytics operators by creating a model from the available datasets. Each dataset is transformed to a vector embedding representation generated by our proposed deep learning model NumTabData2Vec, where similarity search are employed. Through experimental evaluation, we compare the prediction performance and the execution time of our framework to another state-of-the-art modelling operator framework, illustrating that our approach predicts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsFocus
