Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion

Bruno Rigal; Victor Dupriez; Alexis Mignon; Ronan Le Hy; Nicolas Mery

arXiv:2602.11960·cs.CV·February 13, 2026

Benchmarking Vision-Language Models for French PDF-to-Markdown Conversion

Bruno Rigal, Victor Dupriez, Alexis Mignon, Ronan Le Hy, Nicolas Mery

PDF

Open Access 2 Datasets

TL;DR

This paper introduces a French-specific benchmark for evaluating vision-language models on PDF-to-Markdown conversion, emphasizing challenging documents and concrete failure modes to improve downstream retrieval and grounding tasks.

Contribution

It presents a new French-focused benchmark with difficult pages and targeted evaluation metrics, addressing limitations of existing English-centric benchmarks.

Findings

01

Proprietary models excel in handwriting and forms.

02

Open models perform well on printed layouts.

03

Robustness varies significantly across models.

Abstract

This report evaluates PDF-to-Markdown conversion using recent Vision-Language Models (VLMs) on challenging French documents. Document parsing is a critical step for Retrieval-Augmented Generation (RAG) pipelines, where transcription and layout errors propagate to downstream retrieval and grounding. Existing benchmarks often emphasize English or Chinese and can over-penalize benign formatting and linearization choices (e.g., line breaks, list segmentation, alternative table renderings) that are largely irrelevant for downstream use. We introduce a French-focused benchmark of difficult pages selected via model-disagreement sampling from a corpus of 60{,}000 documents, covering handwritten forms, complex layouts, dense tables, and graphics-rich pages. Evaluation is performed with unit-test-style checks that target concrete failure modes (text presence, reading order, and local table…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Digital Humanities and Scholarship · Mathematics, Computing, and Information Processing