Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion
Bruno Rigal, Victor Dupriez, Alexis Mignon, Ronan Le Hy, Nicolas Mery

TL;DR
This paper introduces a French-specific benchmark for evaluating vision-language models on PDF-to-Markdown conversion, emphasizing challenging documents and concrete failure modes to improve downstream retrieval and grounding tasks.
Contribution
It presents a new French-focused benchmark with difficult pages and targeted evaluation metrics, addressing limitations of existing English-centric benchmarks.
Findings
Proprietary models excel in handwriting and forms.
Open models perform well on printed layouts.
Robustness varies significantly across models.
Abstract
This report evaluates PDF-to-Markdown conversion using recent Vision-Language Models (VLMs) on challenging French documents. Document parsing is a critical step for Retrieval-Augmented Generation (RAG) pipelines, where transcription and layout errors propagate to downstream retrieval and grounding. Existing benchmarks often emphasize English or Chinese and can over-penalize benign formatting and linearization choices (e.g., line breaks, list segmentation, alternative table renderings) that are largely irrelevant for downstream use. We introduce a French-focused benchmark of difficult pages selected via model-disagreement sampling from a corpus of 60{,}000 documents, covering handwritten forms, complex layouts, dense tables, and graphics-rich pages. Evaluation is performed with unit-test-style checks that target concrete failure modes (text presence, reading order, and local table…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Digital Humanities and Scholarship · Mathematics, Computing, and Information Processing
