Geodesic Flow Kernels for Semi-Supervised Learning on Mixed-Variable Tabular Dataset
Yoontae Hwang, Yongjae Lee

TL;DR
This paper introduces GFTab, a semi-supervised learning framework for mixed-variable tabular data that uses geodesic flow kernels and variable-specific corruption to improve performance with limited labels.
Contribution
GFTab is the first to combine geodesic flow kernels, variable-specific corruption, and tree-based embeddings for semi-supervised learning on mixed-variable tabular datasets.
Findings
GFTab outperforms existing models on 21 diverse datasets.
GFTab is especially effective with limited labeled data.
The framework captures geometric relationships in corrupted inputs.
Abstract
Tabular data poses unique challenges due to its heterogeneous nature, combining both continuous and categorical variables. Existing approaches often struggle to effectively capture the underlying structure and relationships within such data. We propose GFTab (Geodesic Flow Kernels for Semi- Supervised Learning on Mixed-Variable Tabular Dataset), a semi-supervised framework specifically designed for tabular datasets. GFTab incorporates three key innovations: 1) Variable-specific corruption methods tailored to the distinct properties of continuous and categorical variables, 2) A Geodesic flow kernel based similarity measure to capture geometric changes between corrupted inputs, and 3) Tree-based embedding to leverage hierarchical relationships from available labeled data. To rigorously evaluate GFTab, we curate a comprehensive set of 21 tabular datasets spanning various domains, sizes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Image Processing and 3D Reconstruction · Computational Physics and Python Applications
MethodsSparse Evolutionary Training
