Making Pre-trained Language Models Great on Tabular Prediction

Jiahuan Yan; Bo Zheng; Hongxia Xu; Yiheng Zhu; Danny Z. Chen; Jimeng; Sun; Jian Wu; Jintai Chen

arXiv:2403.01841·cs.CL·March 13, 2024·2 cites

Making Pre-trained Language Models Great on Tabular Prediction

Jiahuan Yan, Bo Zheng, Hongxia Xu, Yiheng Zhu, Danny Z. Chen, Jimeng, Sun, Jian Wu, Jintai Chen

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces TP-BERTa, a pre-trained language model tailored for tabular data prediction, using a novel tokenization method to convert numerical features into discrete tokens, improving transferability and performance.

Contribution

The paper proposes TP-BERTa, a novel pre-trained language model for tabular data that employs relative magnitude tokenization and intra-feature attention to enhance prediction accuracy.

Findings

01

TP-BERTa outperforms existing tabular DNNs in experiments.

02

TP-BERTa is competitive with Gradient Boosted Decision Trees.

03

The model effectively integrates feature names and values for better predictions.

Abstract

The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. However, due to the heterogeneity among tables, such DNN bonus is still far from being well exploited on tabular data prediction (e.g., regression or classification tasks). Condensing knowledge from diverse domains, language models (LMs) possess the capability to comprehend feature names from various tables, potentially serving as versatile learners in transferring knowledge across distinct tables and diverse prediction tasks, but their discrete text representation space is inherently incompatible with numerical feature values in tables. In this paper, we present TP-BERTa, a specifically pre-trained LM for tabular data prediction. Concretely, a novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an…

Peer Reviews

Decision·ICLR 2024 spotlight

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

1. **Originality:** - The approach of "relative magnitude tokenization" (RMT) is an inventive technique for adapting pre-trained language models to tabular data. This method of converting scalar values to a tokenized format to be perceived as meaningful words within the language model's vocabulary stands out as a significant contribution. - The intra-feature attention (IFA) module to fuse feature name and value embeddings is another commendable addition to the field. This ensures a mor

Weaknesses

1. It would have been beneficial if the paper delved deeper into the limitations and potential pitfalls of the relative magnitude tokenization technique. Understanding how the granularity of this tokenization might impact model performance, especially in cases with intricate numerical nuances, is crucial. 2. Comparisons with Gradient Boosted Decision Trees are noted, but an in-depth discussion regarding scenarios where GBDTs might outshine or underperform against the proposed TP-BERTa would prov

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 2

Strengths

* Paper is mostly clearly written.

Weaknesses

* It is very unclear if some distributional patterns from common tabular data can be generalized to other data with completely different distributions. In text, pretraining can learn generic linguistic features such as meaning of the English words or grammars. In tabular data, the core assumption does not hold because it differs drastically between data sources. * The empirical comparison does not essentially provide evidence that pre-training is the main factor to improve the performance of the

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. According to Table 1, the performance of TP-BERTa is strong. It consistently outperforms other tabular DL models and is comparable to XGBoost and CatBoost. 2. The idea of applying relative magnitude tokens (RMT) in tabular DL models is novel according to my knowledge. As pointed out by the "On Embeddings for Numerical Features in Tabular Deep Learning" paper, appropriately embed numerical features is important for tabular DL models. RMT can enhance language models in handling the numerical va

Weaknesses

1. Compared with RMT, the intra-feature attention method is marginally novel and is not showing significant performance boost. 2. The author has not studied the impact of pretrain data diversity in the performance of TP-BERTa. For example, how good will TP-BERTa be if it is only pretrained on 10 classification datasets and 10 regression datasets?

Code & Models

Repositories

jyansir/tp-berta
pytorchOfficial

Videos

Making Pre-trained Language Models Great on Tabular Prediction· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling