TabPedia: Towards Comprehensive Visual Table Understanding with Concept   Synergy

Weichao Zhao; Hao Feng; Qi Liu; Jingqun Tang; Shu Wei; Binghong Wu,; Lei Liao; Yongjie Ye; Hao Liu; Wengang Zhou; Houqiang Li; Can Huang

arXiv:2406.01326·cs.CV·October 14, 2024·1 cites

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Weichao Zhao, Hao Feng, Qi Liu, Jingqun Tang, Shu Wei, Binghong Wu,, Lei Liao, Yongjie Ye, Hao Liu, Wengang Zhou, Houqiang Li, Can Huang

PDF

Open Access 1 Repo 1 Models 1 Datasets 1 Video

TL;DR

TabPedia introduces a unified vision-language model with a concept synergy mechanism that integrates multiple visual table understanding tasks, enhancing comprehension and perception through large language models, and establishes a new comprehensive table VQA benchmark.

Contribution

The paper proposes a novel large vision-language model, TabPedia, with a concept synergy mechanism that unifies diverse VTU tasks and introduces a new benchmark, ComTQA, for real-world evaluation.

Findings

01

TabPedia achieves superior performance on various VTU benchmarks.

02

The concept synergy mechanism effectively integrates perception and comprehension tasks.

03

ComTQA provides a comprehensive dataset for real-world table VQA evaluation.

Abstract

Tables contain factual and quantitative data accompanied by various structures and contents that pose challenges for machine comprehension. Previous methods generally design task-specific architectures and objectives for individual tasks, resulting in modal isolation and intricate workflows. In this paper, we present a novel large vision-language model, TabPedia, equipped with a concept synergy mechanism. In this mechanism, all the involved diverse visual table understanding (VTU) tasks and multi-source visual embeddings are abstracted as concepts. This unified framework allows TabPedia to seamlessly integrate VTU tasks, such as table detection, table structure recognition, table querying, and table question answering, by leveraging the capabilities of large language models (LLMs). Moreover, the concept synergy mechanism enables table perception-related and comprehension-related tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhaowc-ustc/tabpedia
pytorchOfficial

Models

🤗
Zhaowc/TabPedia_v1.0
model· ♡ 3
♡ 3

Datasets

katebor/TableEval
dataset· 102 dl
102 dl

Videos

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy· slideslive

Taxonomy

TopicsData Visualization and Analytics · Video Analysis and Summarization · Time Series Analysis and Forecasting