VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning

Zi-Yi Jia; Zi-Jian Cheng; Xin-Yue Zhang; Kun-Yang Yu; Zhi Zhou; Yu-Feng Li; Lan-Zhe Guo

arXiv:2605.08146·cs.CV·May 20, 2026

VT-Bench: A Unified Benchmark for Visual-Tabular Multi-Modal Learning

Zi-Yi Jia, Zi-Jian Cheng, Xin-Yue Zhang, Kun-Yang Yu, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo

PDF

1 Repo

TL;DR

VT-Bench is the first comprehensive benchmark for visual-tabular multi-modal learning, integrating diverse datasets and models to advance research in high-stakes domains like healthcare and industry.

Contribution

It introduces a unified benchmark with 14 datasets and evaluates 23 models, highlighting challenges and fostering progress in visual-tabular multi-modal learning.

Findings

01

Substantial challenges identified in visual-tabular learning.

02

Evaluation of diverse models reveals performance gaps.

03

Benchmark promotes development of more powerful models.

Abstract

Multi-model learning has attracted great attention in visual-text tasks. However, visual-tabular data, which plays a pivotal role in high-stakes domains like healthcare and industry, remains underexplored. In this paper, we introduce \textit{VT-Bench}, the first unified benchmark for standardizing vision-tabular discriminative prediction and generative reasoning tasks. VT-Bench aggregates 14 datasets across 9 domains (medical-centric, while covering pets, media, and transportation) with over 756K samples. We evaluate 23 representative models, including unimodal experts, specialized visual-tabular models, general-purpose vision-language models (VLMs), and tool-augmented methods, highlighting substantial challenges of visual-tabular learning. We believe VT-Bench will stimulate the community to build more powerful multi-modal vision-tabular foundation models. Benchmark:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ziyi-Jia990/VT-Bench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.