FeatNavigator: Automatic Feature Augmentation on Tabular Data
Jiaming Liang, Chuan Lei, Xiao Qin, Jiani Zhang, Asterios, Katsifodimos, Christos Faloutsos, Huzefa Rangwala

TL;DR
FeatNavigator is a framework that intelligently explores and integrates high-quality features from relational tables to enhance machine learning model performance on tabular data, addressing limitations of existing methods.
Contribution
It introduces a novel search algorithm that evaluates feature importance and join path quality, enabling more effective automatic feature augmentation from distant tables.
Findings
Outperforms state-of-the-art solutions by up to 40.1% in model performance
Effectively utilizes distant features through optimized join path selection
Demonstrates significant improvements across five public datasets
Abstract
Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and generalizability. While recent works have investigated automatic feature augmentation, most of them have limited capabilities in utilizing all useful features as many of them are in candidate tables not directly joinable with the base table. Worse yet, with numerous join paths leading to these distant features, existing solutions fail to fully exploit them within a reasonable compute budget. We present FeatNavigator, an effective and efficient framework that explores and integrates high-quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Web Data Mining and Analysis · Image Retrieval and Classification Techniques
MethodsSparse Evolutionary Training · Balanced Selection
