TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting
Andrei Margeloiu, Adri\'an Bazaga, Nikola Simidjievski, Pietro Li\`o,, Mateja Jamnik

TL;DR
TabMDA is a novel, training-free data augmentation method for tabular data that leverages pre-trained in-context models to improve classifier performance by exploring the embedding space through label-invariant transformations.
Contribution
Introduces TabMDA, a new manifold data augmentation technique for tabular data that utilizes pre-trained in-context models and is applicable to any classifier.
Findings
Significant performance improvements across five classifiers.
Effective augmentation by exploring embedding space.
Applicable to various tabular datasets.
Abstract
Tabular data is prevalent in many critical domains, yet it is often challenging to acquire in large quantities. This scarcity usually results in poor performance of machine learning models on such data. Data augmentation, a common strategy for performance improvement in vision and language tasks, typically underperforms for tabular data due to the lack of explicit symmetries in the input space. To overcome this challenge, we introduce TabMDA, a novel method for manifold data augmentation on tabular data. This method utilises a pre-trained in-context model, such as TabPFN, to map the data into an embedding space. TabMDA performs label-invariant transformations by encoding the data multiple times with varied contexts. This process explores the learned embedding space of the underlying in-context models, thereby enlarging the training dataset. TabMDA is a training-free method, making it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Neural Networks and Applications
