MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning

Wall Kim; Chaeyoung Song; Hanul Kim

arXiv:2602.20223·cs.LG·April 10, 2026

MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning

Wall Kim, Chaeyoung Song, Hanul Kim

PDF

1 Repo

TL;DR

MultiModalPFN extends prior-data fitted networks to effectively integrate heterogeneous modalities like images and text with tabular data, improving performance on multimodal datasets.

Contribution

It introduces a unified framework with modality encoders, projectors, and novel pooling mechanisms to handle multimodal data in a scalable way.

Findings

01

Outperforms state-of-the-art methods on medical and general multimodal datasets.

02

Effectively exploits non-tabular modalities alongside tabular features.

03

Demonstrates scalability and robustness in heterogeneous data learning.

Abstract

Recently, TabPFN has gained attention as a foundation model for tabular data. However, it struggles to integrate heterogeneous modalities such as images and text, which are common in domains like healthcare and marketing, thereby limiting its applicability. To address this, we present the Multi-Modal Prior-data Fitted Network (MMPFN), which extends TabPFN to handle tabular and non-tabular modalities in a unified manner. MMPFN comprises per-modality encoders, modality projectors, and pre-trained foundation models. The modality projectors serve as the critical bridge, transforming non-tabular embeddings into tabular-compatible tokens for unified processing. To this end, we introduce a multi-head gated MLP and a cross-attention pooler that extract richer context from non-tabular inputs while mitigates attention imbalance issue in multimodal learning. Extensive experiments on medical and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

too-z/MultiModalPFN
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.