Deep Learning with Tabular Data: A Self-supervised Approach
Tirth Kiranbhai Vyas

TL;DR
This paper introduces a novel self-supervised Transformer-based approach for tabular data, demonstrating its effectiveness in capturing feature relationships and outperforming traditional models like GBDT and MLP.
Contribution
It presents a new self-supervised training method for TabTransformer, including variants like Binned-TT, improving feature representation for tabular data.
Findings
TabTransformer outperforms baseline models in various tasks
Self-supervised learning enhances feature representation
Optimal input construction improves model performance
Abstract
We have described a novel approach for training tabular data using the TabTransformer model with self-supervised learning. Traditional machine learning models for tabular data, such as GBDT are being widely used though our paper examines the effectiveness of the TabTransformer which is a Transformer based model optimised specifically for tabular data. The TabTransformer captures intricate relationships and dependencies among features in tabular data by leveraging the self-attention mechanism of Transformers. We have used a self-supervised learning approach in this study, where the TabTransformer learns from unlabelled data by creating surrogate supervised tasks, eliminating the need for the labelled data. The aim is to find the most effective TabTransformer model representation of categorical and numerical features. To address the challenges faced during the construction of various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Computational Physics and Python Applications
MethodsAttention Is All You Need · Linear Layer · Dropout · Layer Normalization · Multi-Head Attention · Byte Pair Encoding · Residual Connection · Adam · Softmax · Dense Connections
