UniPredict: Large Language Models are Universal Tabular Classifiers
Ruiyu Wang, Zifeng Wang, Jimeng Sun

TL;DR
This paper introduces UniPredict, a large language model-based approach that acts as a universal predictor for tabular data, capable of handling diverse datasets and tasks with superior performance and adaptability.
Contribution
The paper presents a novel generative modeling approach using LLMs for universal tabular data prediction, trained on multiple datasets to outperform specialized models.
Findings
UniPredict outperforms tree-boosting and neural network baselines by 5.4% to 13.4%.
It demonstrates strong few-shot learning capabilities with over 100% performance gains in low-resource settings.
The model effectively adapts to new tabular prediction tasks with minimal data.
Abstract
Tabular data prediction is a fundamental machine learning task for many applications. Existing methods predominantly employ discriminative modeling and operate under the assumption of a fixed target column, necessitating re-training for every new predictive task. Inspired by the generative power of large language models (LLMs), this paper exploits the idea of building universal tabular data predictors based on generative modeling, namely UniPredict. Here, we demonstrate the scalability of an LLM to extensive tabular datasets, enabling it to comprehend diverse tabular inputs and predict target variables following the provided instructions. Specifically, we train a single LLM on an aggregation of 169 tabular datasets with diverse targets and compare its performance against baselines that are trained on each dataset separately. We observe this versatile UniPredict model demonstrates an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
