TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning
Jiaqi Luo, Yuan Yuan, Shixin Xu

TL;DR
This paper introduces TIME, a multimodal framework that combines a pretrained tabular encoder with image features to improve robustness and performance in tabular-image learning tasks, especially with missing data.
Contribution
The paper presents TIME, a novel multimodal engine that integrates a frozen tabular foundation model with image features, addressing the lack of standardized tabular representations and missing data challenges.
Findings
TIME outperforms baselines on natural and medical datasets.
The approach is robust to missing tabular data.
Extensive experiments validate practical effectiveness.
Abstract
Tabular-image multimodal learning, which integrates structured tabular data with imaging data, holds great promise for a variety of tasks, especially in medical applications. Yet, two key challenges remain: (1) the lack of a standardized, pretrained representation for tabular data, as is commonly available in vision and language domains; and (2) the difficulty of handling missing values in the tabular modality, which are common in real-world medical datasets. To address these issues, we propose the TabPFN-Integrated Multimodal Engine (TIME), a novel multimodal framework that builds on the recently introduced tabular foundation model, TabPFN. TIME leverages TabPFN as a frozen tabular encoder to generate robust, strong embeddings that are naturally resilient to missing data, and combines them with image features from pretrained vision backbones. We explore a range of fusion strategies and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
