VisTabNet: Adapting Vision Transformers for Tabular Data

Witold Wydma\'nski; Ulvi Movsum-zada; Jacek Tabor; Marek \'Smieja

arXiv:2501.00057·cs.LG·April 28, 2025

VisTabNet: Adapting Vision Transformers for Tabular Data

Witold Wydma\'nski, Ulvi Movsum-zada, Jacek Tabor, Marek \'Smieja

PDF

Open Access 1 Repo

TL;DR

VisTabNet introduces a novel approach to adapt pre-trained Vision Transformers for tabular data by projecting tabular inputs into patch embeddings, enabling effective transfer learning and outperforming traditional methods on small datasets.

Contribution

The paper presents a new transfer learning method that adapts Vision Transformers for tabular data, reducing architecture design effort and improving performance on small datasets.

Findings

01

Outperforms traditional ensemble and deep learning models on small tabular datasets.

02

Demonstrates the feasibility of transferring pre-trained image models to tabular data tasks.

03

Shows that cross-modal transfer learning can extend the applicability of Vision Transformers.

Abstract

Although deep learning models have had great success in natural language processing and computer vision, we do not observe comparable improvements in the case of tabular data, which is still the most common data type used in biological, industrial and financial applications. In particular, it is challenging to transfer large-scale pre-trained models to downstream tasks defined on small tabular datasets. To address this, we propose VisTabNet -- a cross-modal transfer learning method, which allows for adapting Vision Transformer (ViT) with pre-trained weights to process tabular data. By projecting tabular inputs to patch embeddings acceptable by ViT, we can directly apply a pre-trained Transformer Encoder to tabular inputs. This approach eliminates the conceptual cost of designing a suitable architecture for processing tabular data, while reducing the computational cost of training the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wwydmanski/VisTabNet
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Image Retrieval and Classification Techniques

MethodsAttention Is All You Need · Byte Pair Encoding · Linear Layer · Softmax · Dense Connections · Absolute Position Encodings · Dropout · Adam · Residual Connection · Vision Transformer