Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation

Pius Horn; Janis Keuper

arXiv:2603.18652·cs.CV·March 20, 2026

Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation

Pius Horn, Janis Keuper

PDF

Open Access 2 Datasets

TL;DR

This paper introduces a benchmarking framework for PDF table extraction that uses synthetic PDFs with LaTeX ground truth and employs LLMs for semantic evaluation, outperforming traditional metrics in correlating with human judgment.

Contribution

It presents a novel semantic evaluation method using LLMs for PDF table extraction, along with a synthetic dataset and a scalable benchmarking pipeline.

Findings

01

LLM-based evaluation correlates highly with human judgment (r=0.93).

02

Significant performance differences among 21 PDF parsers.

03

Provides a reproducible framework for evaluating table extraction accuracy.

Abstract

Reliably extracting tables from PDFs is essential for large-scale scientific data mining and knowledge base construction, yet existing evaluation approaches rely on rule-based metrics that fail to capture semantic equivalence of table content. We present a benchmarking framework based on synthetically generated PDFs with precise LaTeX ground truth, using tables sourced from arXiv to ensure realistic complexity and diversity. As our central methodological contribution, we apply LLM-as-a-judge for semantic table evaluation, integrated into a matching pipeline that accommodates inconsistencies in parser outputs. Through a human validation study comprising over 1,500 quality judgments on extracted table pairs, we show that LLM-based evaluation achieves substantially higher correlation with human judgment (Pearson r=0.93) compared to Tree Edit Distance-based Similarity (TEDS, r=0.68) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Data Quality and Management · Web Data Mining and Analysis