When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents

Virginie Mouilleron; Th\'eo Lasnier; Anna Mosolova; Djam\'e Seddah

arXiv:2602.10384·cs.CL·March 17, 2026

When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents

Virginie Mouilleron, Th\'eo Lasnier, Anna Mosolova, Djam\'e Seddah

PDF

Open Access

TL;DR

This paper introduces a new benchmark for evaluating multimodal vision-language models on French financial documents, revealing strengths in extraction tasks but significant challenges in multi-turn reasoning and chart interpretation.

Contribution

It presents Multimodal Finance Eval, the first comprehensive French financial document benchmark, and evaluates six large VLMs, highlighting their limitations in multi-turn and visual reasoning tasks.

Findings

01

Models achieve 85-90% accuracy on text and table tasks.

02

Models struggle with chart interpretation, achieving only 34-62% accuracy.

03

Multi-turn dialogue tasks cause accuracy to drop to around 50% due to error propagation.

Abstract

Vision-language models (VLMs) perform well on many document understanding tasks, yet their reliability in specialized, non-English domains remains underexplored. This gap is especially critical in finance, where documents mix dense regulatory text, numerical tables, and visual charts, and where extraction errors can have real-world consequences. We introduce Multimodal Finance Eval, the first multimodal benchmark for evaluating French financial document understanding. The dataset contains 1,204 expert-validated questions spanning text extraction, table comprehension, chart interpretation, and multi-turn conversational reasoning, drawn from real investment prospectuses, KIDs, and PRIIPs. We evaluate six open-weight VLMs (8B-124B parameters) using an LLM-as-judge protocol. While models achieve strong performance on text and table tasks (85-90% accuracy), they struggle with chart…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Advanced Text Analysis Techniques · Topic Modeling