TableGPT2: A Large Multimodal Model with Tabular Data Integration
Aofeng Su, Aowen Wang, Chao Ye, Chen Zhou, Ga Zhang, Gang Chen,, Guangcheng Zhu, Haobo Wang, Haokai Xu, Hao Chen, Haoze Li, Haoxuan Lan,, Jiaming Tian, Jing Yuan, Junbo Zhao, Junlin Zhou, Kaizhe Shou, Liangyu Zha,, Lin Long, Liyao Li, Pengzuo Wu, Qi Zhang, Qingyi Huang

TL;DR
TableGPT2 is a large multimodal model that significantly advances the integration of tabular data into language models, enabling better handling of real-world, schema-rich, and ambiguous table queries with extensive training and a novel table encoder.
Contribution
The paper introduces TableGPT2, a large multimodal model with a novel table encoder, trained on an unprecedented scale of tabular data, improving table-centric tasks and general language abilities.
Findings
Achieves over 35% performance improvement on benchmark metrics.
Successfully handles ambiguous and irregular tables.
Maintains strong general language and coding skills.
Abstract
The emergence of models like GPTs, Claude, LLaMA, and Qwen has reshaped AI applications, presenting vast new opportunities across industries. Yet, the integration of tabular data remains notably underdeveloped, despite its foundational role in numerous real-world domains. This gap is critical for three main reasons. First, database or data warehouse data integration is essential for advanced applications; second, the vast and largely untapped resource of tabular data offers immense potential for analysis; and third, the business intelligence domain specifically demands adaptable, precise solutions that many current LLMs may struggle to provide. In response, we introduce TableGPT2, a model rigorously pre-trained and fine-tuned with over 593.8K tables and 2.36M high-quality query-table-output tuples, a scale of table-related data unprecedented in prior research. This extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
MethodsLLaMA
