PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction

Brandon Smock; Valerie Faucon-Morin; Max Sokolov; Libin Liang; Tayyibah Khanam; Amrit Ramesh; Maury Courtland

arXiv:2512.10888·cs.CV·March 19, 2026

PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction

Brandon Smock, Valerie Faucon-Morin, Max Sokolov, Libin Liang, Tayyibah Khanam, Amrit Ramesh, Maury Courtland

PDF

Open Access 2 Models 1 Datasets

TL;DR

PubTables-v2 introduces a large-scale dataset for full-page and multi-page table extraction, enabling progress in visual document understanding by providing a benchmark for complex table recognition tasks.

Contribution

The paper presents PubTables-v2, the first large-scale dataset for multi-page table structure recognition, facilitating evaluation and development of advanced table extraction methods.

Findings

01

Multi-page table recognition remains a significant challenge.

02

Introducing an image classifier for table merging improves extraction performance.

03

Baseline evaluations highlight current models' limitations in multi-page table tasks.

Abstract

Table extraction (TE) is a key challenge in visual document understanding. Traditional approaches detect tables first, then recognize their structure. Recently, interest has surged in developing methods, such as vision-language models (VLMs), that can extract tables directly in their full page or document context. However, progress has been difficult to demonstrate due to a lack of annotated data. To address this, we create a new large-scale dataset, PubTables-v2. PubTables-v2 supports a number of challenging table extraction tasks. Notably, it is the first large-scale benchmark for multi-page table structure recognition. We evaluate several smaller specialized VLMs to establish baseline performance on these tasks. As we show, multi-page table recognition is a key gap in current models' capabilities. Interestingly, we show that introducing an image classifier that predicts when to merge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

kensho/PubTables-v2
dataset· 709 dl
709 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Text and Document Classification Technologies