Amalur: Data Integration Meets Machine Learning
Rihan Hai, Christos Koutras, Andra Ionescu, Ziyu Li, Wenbo Sun, Jessie, van Schijndel, Yan Kang, Asterios Katsifodimos

TL;DR
This paper explores integrating traditional data integration techniques with modern machine learning, focusing on metadata use, federated learning, and feature augmentation to address data silos, privacy, and efficiency challenges.
Contribution
It presents a vision for combining data integration with ML, analyzing use cases like feature augmentation and federated learning, and identifying new research opportunities.
Findings
Metadata can enhance ML model effectiveness.
Data integration techniques can improve federated learning.
New research directions in systems and representations.
Abstract
The data needed for machine learning (ML) model training, can reside in different separate sites often termed data silos. For data-intensive ML applications, data silos pose a major challenge: the integration and transformation of data demand a lot of manual work and computational resources. With data privacy and security constraints, data often cannot leave the local sites, and a model has to be trained in a decentralized manner. In this work, we present a vision on how to bridge the traditional data integration (DI) techniques with the requirements of modern machine learning. We explore the possibilities of utilizing metadata obtained from data integration processes for improving the effectiveness and efficiency of ML models. We analyze two common use cases over data silos, feature augmentation and federated learning. Bringing data integration and machine learning together, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · AI in cancer detection
