FCMBench-Video: Benchmarking Document Video Intelligence

Runze Cui; Fangxin Shang; Yehui Yang; Qing Yang; Yanwu Xu; Tao Chen

arXiv:2604.25186·cs.CV·May 1, 2026

FCMBench-Video: Benchmarking Document Video Intelligence

Runze Cui, Fangxin Shang, Yehui Yang, Qing Yang, Yanwu Xu, Tao Chen

PDF

1 Datasets

TL;DR

FCMBench-Video is a comprehensive benchmark designed to evaluate document-video understanding capabilities in financial contexts, focusing on perception, reasoning, and robustness across diverse document types and conditions.

Contribution

It introduces a large-scale, realistic dataset and evaluation framework for assessing Video-MLLMs on document perception, reasoning, and robustness in authenticity-sensitive applications.

Findings

01

Evaluations reveal that counting tasks are highly duration-sensitive.

02

Cross-Document Validation and Evidence-Grounded Selection assess higher-level evidence integration.

03

The benchmark effectively differentiates system capabilities and robustness.

Abstract

Document understanding is a critical capability in financial credit review, onboarding, and remote verification, where both decision accuracy and evidence traceability matter. Compared with static document images, document videos present a temporally redundant and sequentially unfolding evidence stream, require evidence integration across frames, and preserve acquisition-process cues relevant to authenticity-sensitive and anti-fraud review. We introduce FCMBench-Video, a benchmark for document-video intelligence that evaluates document perception, temporal grounding, and evidence-grounded reasoning under realistic capture conditions. For privacy-compliant yet realistic data at scale, we organize construction as an atomic-acquisition and composition workflow that records reusable single-document clips, applies controlled degradations, and assembles long-form multi-document videos with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

QFIN/FCMBench-Data
dataset· 119 dl
119 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.