Language Modeling on Tabular Data: A Survey of Foundations, Techniques and Evolution
Yucheng Ruan, Xiang Lan, Jingying Ma, Yizhi Dong, Kai He, Mengling, Feng

TL;DR
This survey reviews the development of language modeling techniques for tabular data, highlighting recent advances with transformer architectures and large language models, and discusses challenges and future directions.
Contribution
It provides the first comprehensive systematic review of language modeling methods for tabular data, covering data structures, datasets, architectures, and evolution from traditional models to large language models.
Findings
Transformers have become central to tabular data modeling.
Pre-trained language models improve performance with less data.
Large language models enable advanced applications with minimal fine-tuning.
Abstract
Tabular data, a prevalent data type across various domains, presents unique challenges due to its heterogeneous nature and complex structural relationships. Achieving high predictive performance and robustness in tabular data analysis holds significant promise for numerous applications. Influenced by recent advancements in natural language processing, particularly transformer architectures, new methods for tabular data modeling have emerged. Early techniques concentrated on pre-training transformers from scratch, often encountering scalability issues. Subsequently, methods leveraging pre-trained language models like BERT have been developed, which require less data and yield enhanced performance. The recent advent of large language models, such as GPT and LLaMA, has further revolutionized the field, facilitating more advanced and diverse applications with minimal fine-tuning. Despite…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Dropout · WordPiece · Residual Connection · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Multi-Head Attention
