Lightweight and Production-Ready PDF Visual Element Parsing

Meizhu Liu; Yassi Abbasi; Matthew Rowe; Michael Avendi; Paul Li

arXiv:2604.23276·cs.CV·April 28, 2026

Lightweight and Production-Ready PDF Visual Element Parsing

Meizhu Liu, Yassi Abbasi, Matthew Rowe, Michael Avendi, Paul Li

PDF

TL;DR

This paper introduces a lightweight PDF parsing framework that accurately detects visual elements and associates captions, significantly improving downstream document understanding tasks while being suitable for production deployment.

Contribution

The authors develop a novel PDF parsing system combining heuristics, layout analysis, and semantic similarity, achieving high accuracy and efficiency in production environments.

Findings

01

Achieves ≥96% visual element detection accuracy

02

Attains 93% caption association accuracy

03

Outperforms state-of-the-art parsers and models in retrieval and QA tasks

Abstract

PDF documents contain critical visual elements such as figures, tables, and forms whose accurate extraction is essential for document understanding and multimodal retrieval-augmented generation (RAG). Existing PDF parsers often miss complex visuals, extract non-informative artifacts (e.g., watermarks, logos), produce fragmented elements, and fail to reliably associate captions with their corresponding elements, which degrades downstream retrieval and question answering. We present a lightweight and production level PDF parsing framework that can accurately detect visual elements and associates captions using a combination of spatial heuristics, layout analysis, and semantic similarity. On popular benchmark datasets and internal product data, the proposed solution achieves $\geq 96%$ visual element detection accuracy and $93%$ caption association accuracy. When used as a preprocessing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.