Text Serialization and Their Relationship with the Conventional Paradigms of Tabular Machine Learning
Kyoka Ono, Simon A. Lee

TL;DR
This paper evaluates the effectiveness of using language models with text serialization for tabular data tasks, finding that current pre-trained models do not outperform traditional methods.
Contribution
It provides a comprehensive comparison between emerging LM-based approaches and conventional tabular machine learning paradigms, highlighting their limitations.
Findings
Pre-trained LMs do not currently surpass traditional methods.
Data representation impacts prediction performance.
LM approaches face challenges with class imbalance and distribution shift.
Abstract
Recent research has explored how Language Models (LMs) can be used for feature representation and prediction in tabular machine learning tasks. This involves employing text serialization and supervised fine-tuning (SFT) techniques. Despite the simplicity of these techniques, significant gaps remain in our understanding of the applicability and reliability of LMs in this context. Our study assesses how emerging LM technologies compare with traditional paradigms in tabular machine learning and evaluates the feasibility of adopting similar approaches with these advanced technologies. At the data level, we investigate various methods of data representation and curation of serialized tabular data, exploring their impact on prediction performance. At the classification level, we examine whether text serialization combined with LMs enhances performance on tabular datasets (e.g. class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship · Computational Physics and Python Applications · Authorship Attribution and Profiling
