Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data
Ekaterina Borisova, Fabio Barth, Nils Feldhus, Raia Abu Ahmad, Malte Ostendorff, Pedro Ortiz Suarez, Georg Rehm, Sebastian M\"oller

TL;DR
This study evaluates the performance of text-based and multimodal LLMs on table understanding across scientific and non-scientific domains, revealing robustness across formats but challenges with scientific tables.
Contribution
Introduces the TableEval benchmark with diverse table formats and domains, and provides a comprehensive cross-domain, cross-modality analysis of LLMs' table understanding capabilities.
Findings
LLMs are robust across different table formats.
Significant challenges remain in understanding scientific tables.
Multimodal LLMs show potential but need improvement for scientific data.
Abstract
Tables are among the most widely used tools for representing structured data in research, business, medicine, and education. Although LLMs demonstrate strong performance in downstream tasks, their efficiency in processing tabular data remains underexplored. In this paper, we investigate the effectiveness of both text-based and multimodal LLMs on table understanding tasks through a cross-domain and cross-modality evaluation. Specifically, we compare their performance on tables from scientific vs. non-scientific contexts and examine their robustness on tables represented as images vs. text. Additionally, we conduct an interpretability analysis to measure context usage and input relevance. We also introduce the TableEval benchmark, comprising 3017 tables from scholarly publications, Wikipedia, and financial reports, where each table is provided in five different formats: Image, Dictionary,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
