Table Enrichment System for Machine Learning
Yuyang Dong, Masafumi Oyamada

TL;DR
This paper introduces a system that enhances tabular data by integrating external attributes from data lakes, thereby improving machine learning prediction accuracy through a multi-stage enrichment process.
Contribution
It presents a novel table enrichment system with a four-stage process to efficiently augment query tables for better machine learning performance.
Findings
Improved prediction accuracy with enriched tables
Effective external attribute integration from data lakes
Demonstrated system usability via web interface
Abstract
Data scientists are constantly facing the problem of how to improve prediction accuracy with insufficient tabular data. We propose a table enrichment system that enriches a query table by adding external attributes (columns) from data lakes and improves the accuracy of machine learning predictive models. Our system has four stages, join row search, task-related table selection, row and column alignment, and feature selection and evaluation, to efficiently create an enriched table for a given query table and a specified machine learning task. We demonstrate our system with a web UI to show the use cases of table enrichment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Time Series Analysis and Forecasting · Data Stream Mining Techniques
MethodsFeature Selection
