TabText: Language-Based Representations of Tabular Health Data for Predictive Modelling
Kimberly Villalobos Carballo, Liangyuan Na, Yu Ma, L\'eonard Boussioux, Cynthia Zeng, Luis R. Soenksen, Dimitris Bertsimas

TL;DR
TabText introduces a novel method that converts tabular medical data into contextual language representations using pretrained language models, improving predictive performance and generalization across diverse healthcare datasets.
Contribution
The paper presents TabText, a new approach that leverages language models for feature extraction from tabular health data, reducing manual preprocessing and enhancing model accuracy.
Findings
Achieved AUC of 0.75-0.94 on inpatient tasks across hospitals.
Improved out-of-sample AUC by up to 4 percentage points with embeddings.
Generalized well to unseen hospitals and datasets.
Abstract
Tabular medical records remain the most readily available data format for applying machine learning in healthcare. However, traditional data preprocessing ignores valuable contextual information in tables and requires substantial manual cleaning and harmonisation, creating a bottleneck for model development. We introduce TabText, a preprocessing and feature extraction method that leverages contextual information and streamlines the curation of tabular medical data. This method converts tables into contextual language and applies pretrained large language models (LLMs) to generate task-independent numerical representations. These fixed embeddings are then used as input for various predictive tasks. TabText was evaluated on nine inpatient flow prediction tasks (e.g., ICU admission, discharge, mortality) using electronic medical records across six hospitals from a US health system, and on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Artificial Intelligence in Healthcare and Education
