Gem: Gaussian Mixture Model Embeddings for Numerical Feature Distributions
Hafiz Tayyab Rauf, Alex Bogatu, Norman W. Paton, Andre Freitas

TL;DR
Gem introduces Gaussian Mixture Model embeddings that effectively capture numerical feature distributions, improving semantic type detection and dataset analysis without relying on contextual information.
Contribution
The paper presents a novel GMM-based embedding method for numerical data that outperforms baselines and integrates attribute names for enhanced performance.
Findings
Gem outperforms baseline methods on benchmark datasets.
The method effectively captures distributional, statistical, and contextual properties.
Integration with attribute names improves overall accuracy.
Abstract
Embeddings are now used to underpin a wide variety of data management tasks, including entity resolution, dataset search and semantic type detection. Such applications often involve datasets with numerical columns, but there has been more emphasis placed on the semantics of categorical data in embeddings than on the distinctive features of numerical data. In this paper, we propose a method called Gem (Gaussian mixture model embeddings) that creates embeddings that build on numerical value distributions from columns. The proposed method specializes a Gaussian Mixture Model (GMM) to identify and cluster columns with similar value distributions. We introduce a signature mechanism that generates a probability matrix for each column, indicating its likelihood of belonging to specific Gaussian components, which can be used for different applications, such as to determine semantic types.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Time Series Analysis and Forecasting · Bayesian Methods and Mixture Models
