Popularity Driven Data Integration

Fausto Giunchiglia; Simone Bocca; Mattia Fumagalli; Mayukh Bagchi and; Alessio Zamboni

arXiv:2209.14049·cs.AI·September 29, 2022

Popularity Driven Data Integration

Fausto Giunchiglia, Simone Bocca, Mattia Fumagalli, Mayukh Bagchi and, Alessio Zamboni

PDF

Open Access

TL;DR

This paper introduces iTelos, a methodology that leverages data popularity to optimize data integration, reducing costs and enhancing reusability in large-scale analytics.

Contribution

It proposes a novel approach that treats data differently based on popularity to minimize integration costs and improve data reusability.

Findings

01

Reduced data preprocessing costs through popularity-based treatment

02

Enhanced backward compatibility and data sharing

03

Improved reusability of integrated data

Abstract

More and more, with the growing focus on large scale analytics, we are confronted with the need of integrating data from multiple sources. The problem is that these data are impossible to reuse as-is. The net result is high cost, with the further drawback that the resulting integrated data will again be hardly reusable as-is. iTelos is a general purpose methodology aiming at minimizing the effects of this process. The intuition is that data will be treated differently based on their popularity: the more a certain set of data have been reused, the more they will be reused and the less they will be changed across reuses, thus decreasing the overall data preprocessing costs, while increasing backward compatibility and future sharing

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries · Data Quality and Management · Data Management and Algorithms