VAREX: A Benchmark for Multi-Modal Structured Extraction from Documents

Udi Barzelay; Ophir Azulai; Inbar Shapira; Idan Friedman; Foad Abo Dahood; Madison Lee; Abraham Daniels

arXiv:2603.15118·cs.CV·April 10, 2026

VAREX: A Benchmark for Multi-Modal Structured Extraction from Documents

Udi Barzelay, Ophir Azulai, Inbar Shapira, Idan Friedman, Foad Abo Dahood, Madison Lee, Abraham Daniels

PDF

1 Repo 1 Datasets

TL;DR

VAREX is a comprehensive benchmark for evaluating multimodal foundation models on structured data extraction from government forms, emphasizing input modality effects and model scalability.

Contribution

It introduces a new benchmark with synthetic, multi-modal documents, enabling systematic analysis of input formats and model performance, especially for small-scale models.

Findings

01

Models under 4B parameters struggle with schema compliance, reducing scores by 45-65 pp.

02

Fine-tuning at 2B parameters significantly improves extraction accuracy (+81 pp).

03

Layout-preserving text enhances accuracy more than visual cues, with gains of 3-18 pp.

Abstract

We introduce VAREX (VARied-schema EXtraction), a benchmark for evaluating multimodal foundation models on structured data extraction from government forms. VAREX employs a Reverse Annotation pipeline that programmatically fills PDF templates with synthetic values, producing deterministic ground truth validated through three-phase quality assurance. The benchmark comprises 1,777 documents with 1,771 unique schemas across three structural categories, each provided in four input modalities: plain text, layout-preserving text (whitespace-aligned to approximate column positions), document image, or both text and image combined. Unlike existing benchmarks that evaluate from a single input representation, VAREX provides four controlled modalities per document, enabling systematic ablation of how input format affects extraction accuracy -- a capability absent from prior benchmarks. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

udibarzi/varex-bench
github

Datasets

ibm-research/VAREX
dataset· 2.6k dl
2.6k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.