Text Serialization and Their Relationship with the Conventional   Paradigms of Tabular Machine Learning

Kyoka Ono; Simon A. Lee

arXiv:2406.13846·cs.CL·June 21, 2024

Text Serialization and Their Relationship with the Conventional Paradigms of Tabular Machine Learning

Kyoka Ono, Simon A. Lee

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of using language models with text serialization for tabular data tasks, finding that current pre-trained models do not outperform traditional methods.

Contribution

It provides a comprehensive comparison between emerging LM-based approaches and conventional tabular machine learning paradigms, highlighting their limitations.

Findings

01

Pre-trained LMs do not currently surpass traditional methods.

02

Data representation impacts prediction performance.

03

LM approaches face challenges with class imbalance and distribution shift.

Abstract

Recent research has explored how Language Models (LMs) can be used for feature representation and prediction in tabular machine learning tasks. This involves employing text serialization and supervised fine-tuning (SFT) techniques. Despite the simplicity of these techniques, significant gaps remain in our understanding of the applicability and reliability of LMs in this context. Our study assesses how emerging LM technologies compare with traditional paradigms in tabular machine learning and evaluates the feasibility of adopting similar approaches with these advanced technologies. At the data level, we investigate various methods of data representation and curation of serialized tabular data, exploring their impact on prediction performance. At the classification level, we examine whether text serialization combined with LMs enhances performance on tabular datasets (e.g. class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Humanities and Scholarship · Computational Physics and Python Applications · Authorship Attribution and Profiling