VisTabNet: Adapting Vision Transformers for Tabular Data
Witold Wydma\'nski, Ulvi Movsum-zada, Jacek Tabor, Marek \'Smieja

TL;DR
VisTabNet introduces a novel approach to adapt pre-trained Vision Transformers for tabular data by projecting tabular inputs into patch embeddings, enabling effective transfer learning and outperforming traditional methods on small datasets.
Contribution
The paper presents a new transfer learning method that adapts Vision Transformers for tabular data, reducing architecture design effort and improving performance on small datasets.
Findings
Outperforms traditional ensemble and deep learning models on small tabular datasets.
Demonstrates the feasibility of transferring pre-trained image models to tabular data tasks.
Shows that cross-modal transfer learning can extend the applicability of Vision Transformers.
Abstract
Although deep learning models have had great success in natural language processing and computer vision, we do not observe comparable improvements in the case of tabular data, which is still the most common data type used in biological, industrial and financial applications. In particular, it is challenging to transfer large-scale pre-trained models to downstream tasks defined on small tabular datasets. To address this, we propose VisTabNet -- a cross-modal transfer learning method, which allows for adapting Vision Transformer (ViT) with pre-trained weights to process tabular data. By projecting tabular inputs to patch embeddings acceptable by ViT, we can directly apply a pre-trained Transformer Encoder to tabular inputs. This approach eliminates the conceptual cost of designing a suitable architecture for processing tabular data, while reducing the computational cost of training the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
MethodsAttention Is All You Need · Byte Pair Encoding · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Adam · Residual Connection · Vision Transformer
