LLM Embeddings for Deep Learning on Tabular Data
Boshko Koloski, Andrei Margeloiu, Xiangjian Jiang, Bla\v{z} \v{S}krlj,, Nikola Simidjievski, Mateja Jamnik

TL;DR
This paper introduces a novel method that converts tabular data into text and uses large language models to generate embeddings, enhancing deep learning performance on tabular datasets.
Contribution
The paper proposes transforming tabular data into text and leveraging pre-trained LLMs for embeddings, enabling better transfer and improved accuracy over existing methods.
Findings
Improved accuracy on seven classification datasets
Outperforms models like MLP, ResNet, FT-Transformer
Demonstrates effectiveness of text-based encoding for tabular data
Abstract
Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Advanced Data Processing Techniques · Neural Networks and Applications
MethodsAverage Pooling · Max Pooling · Convolution · Kaiming Initialization · FT-Transformer · Global Average Pooling
