Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation
Pius Horn, Janis Keuper

TL;DR
This paper introduces a benchmarking framework for PDF table extraction that uses synthetic PDFs with LaTeX ground truth and employs LLMs for semantic evaluation, outperforming traditional metrics in correlating with human judgment.
Contribution
It presents a novel semantic evaluation method using LLMs for PDF table extraction, along with a synthetic dataset and a scalable benchmarking pipeline.
Findings
LLM-based evaluation correlates highly with human judgment (r=0.93).
Significant performance differences among 21 PDF parsers.
Provides a reproducible framework for evaluating table extraction accuracy.
Abstract
Reliably extracting tables from PDFs is essential for large-scale scientific data mining and knowledge base construction, yet existing evaluation approaches rely on rule-based metrics that fail to capture semantic equivalence of table content. We present a benchmarking framework based on synthetically generated PDFs with precise LaTeX ground truth, using tables sourced from arXiv to ensure realistic complexity and diversity. As our central methodological contribution, we apply LLM-as-a-judge for semantic table evaluation, integrated into a matching pipeline that accommodates inconsistencies in parser outputs. Through a human validation study comprising over 1,500 quality judgments on extracted table pairs, we show that LLM-based evaluation achieves substantially higher correlation with human judgment (Pearson r=0.93) compared to Tree Edit Distance-based Similarity (TEDS, r=0.68) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Data Quality and Management · Web Data Mining and Analysis
