Ember: No-Code Context Enrichment via Similarity-Based Keyless Joins
Sahaana Suri, Ihab F. Ilyas, Christopher R\'e, Theodoros Rekatsinas

TL;DR
Ember is a system that automates context enrichment in structured data by using Transformer-based embeddings for keyless joins, enabling no-code ML pipelines across multiple domains with significant performance improvements.
Contribution
Ember introduces a novel no-code system that automates keyless joins through learned embeddings, facilitating easier context enrichment in structured data for ML pipelines.
Findings
Enables no-code ML pipelines for five domains.
Improves recall by up to 39% over alternatives.
Requires minimal configuration, often just one line.
Abstract
Structured data, or data that adheres to a pre-defined schema, can suffer from fragmented context: information describing a single entity can be scattered across multiple datasets or tables tailored for specific business needs, with no explicit linking keys (e.g., primary key-foreign key relationships or heuristic functions). Context enrichment, or rebuilding fragmented context, using keyless joins is an implicit or explicit step in machine learning (ML) pipelines over structured data sources. This process is tedious, domain-specific, and lacks support in now-prevalent no-code ML systems that let users create ML pipelines using just input data and high-level configuration files. In response, we propose Ember, a system that abstracts and automates keyless joins to generalize context enrichment. Our key insight is that Ember can enable a general keyless join operator by constructing an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling · Time Series Analysis and Forecasting
