Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning
Hongxi Mao, Wei Zhou, Mengting Jia, Tao Fang, Huan Gao, Bin Zhang, Shangyang Li

TL;DR
This paper introduces a schema-adaptive learning method using LLMs to create transferable tabular embeddings, enabling zero-shot schema generalization in clinical reasoning tasks with heterogeneous EHR data.
Contribution
The authors propose a novel LLM-based approach for schema-adaptive tabular representation learning that achieves zero-shot transfer across unseen schemas in clinical data.
Findings
Achieves state-of-the-art performance on dementia diagnosis datasets.
Successfully transfers to unseen schemas without manual feature engineering.
Outperforms clinical experts in retrospective diagnostic tasks.
Abstract
Machine learning for tabular data remains constrained by poor schema generalization, a challenge rooted in the lack of semantic understanding of structured variables. This challenge is particularly acute in domains like clinical medicine, where electronic health record (EHR) schemas vary significantly. To solve this problem, we propose Schema-Adaptive Tabular Representation Learning, a novel method that leverages large language models (LLMs) to create transferable tabular embeddings. By transforming structured variables into semantic natural language statements and encoding them with a pretrained LLM, our approach enables zero-shot alignment across unseen schemas without manual feature engineering or retraining. We integrate our encoder into a multimodal framework for dementia diagnosis, combining tabular and MRI data. Experiments on NACC and ADNI datasets demonstrate state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
