Hippasus: Effective and Efficient Automatic Feature Augmentation for Machine Learning Tasks on Relational Data
Serafeim Papadias, Kostas Patroumpas, Dimitrios Skoutas

TL;DR
Hippasus is a modular framework that enhances feature augmentation for relational data by combining semantic reasoning and statistical signals, significantly improving accuracy and efficiency in machine learning tasks.
Contribution
It introduces a novel approach that integrates LLM-based semantic reasoning with statistical signals to efficiently identify and select high-quality features from complex relational schemas.
Findings
Achieves up to 26.8% accuracy improvement over baselines
Reduces feature augmentation runtime significantly
Effectively balances effectiveness and scalability in complex schemas
Abstract
Machine learning models depend critically on feature quality, yet useful features are often scattered across multiple relational tables. Feature augmentation enriches a base table by discovering and integrating features from related tables through join operations. However, scaling this process to complex schemas with many tables and multi-hop paths remains challenging. Feature augmentation must address three core tasks: identify promising join paths that connect the base table to candidate tables, execute these joins to materialize augmented data, and select the most informative features from the results. Existing approaches face a fundamental tradeoff between effectiveness and efficiency: achieving high accuracy requires exploring many candidate paths, but exhaustive exploration is computationally prohibitive. Some methods compromise by considering only immediate neighbors, limiting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Machine Learning in Healthcare · Advanced Graph Neural Networks
