When Tables Go Crazy: Evaluating Multimodal Models on French Financial Documents
Virginie Mouilleron, Th\'eo Lasnier, Anna Mosolova, Djam\'e Seddah

TL;DR
This paper introduces a new benchmark for evaluating multimodal vision-language models on French financial documents, revealing strengths in extraction tasks but significant challenges in multi-turn reasoning and chart interpretation.
Contribution
It presents Multimodal Finance Eval, the first comprehensive French financial document benchmark, and evaluates six large VLMs, highlighting their limitations in multi-turn and visual reasoning tasks.
Findings
Models achieve 85-90% accuracy on text and table tasks.
Models struggle with chart interpretation, achieving only 34-62% accuracy.
Multi-turn dialogue tasks cause accuracy to drop to around 50% due to error propagation.
Abstract
Vision-language models (VLMs) perform well on many document understanding tasks, yet their reliability in specialized, non-English domains remains underexplored. This gap is especially critical in finance, where documents mix dense regulatory text, numerical tables, and visual charts, and where extraction errors can have real-world consequences. We introduce Multimodal Finance Eval, the first multimodal benchmark for evaluating French financial document understanding. The dataset contains 1,204 expert-validated questions spanning text extraction, table comprehension, chart interpretation, and multi-turn conversational reasoning, drawn from real investment prospectuses, KIDs, and PRIIPs. We evaluate six open-weight VLMs (8B-124B parameters) using an LLM-as-judge protocol. While models achieve strong performance on text and table tasks (85-90% accuracy), they struggle with chart…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Text Analysis Techniques · Topic Modeling
