TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu,, Lei Liao, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang

TL;DR
TabPedia introduces a unified vision-language model with a concept synergy mechanism that integrates multiple visual table understanding tasks, enhancing comprehension and perception through large language models, and establishes a new comprehensive table VQA benchmark.
Contribution
The paper proposes a novel large vision-language model, TabPedia, with a concept synergy mechanism that unifies diverse VTU tasks and introduces a new benchmark, ComTQA, for real-world evaluation.
Findings
TabPedia achieves superior performance on various VTU benchmarks.
The concept synergy mechanism effectively integrates perception and comprehension tasks.
ComTQA provides a comprehensive dataset for real-world table VQA evaluation.
Abstract
Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy mechanism. In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering, by leveraging the capabilities of large language models (LLMs). Moreover, the concept synergy mechanism enables table perception-related and comprehension-related tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Visualization and Analytics · Video Analysis and Summarization · Time Series Analysis and Forecasting
