A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

Yuxuan Han; Yuanxing Zhang; Yushuo Wang; Yichao Jin; Kenneth Zhu Ke; Jingyuan Zhao

arXiv:2604.26462·cs.CV·April 30, 2026

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

Yuxuan Han, Yuanxing Zhang, Yushuo Wang, Yichao Jin, Kenneth Zhu Ke, Jingyuan Zhao

PDF

TL;DR

This paper introduces a multistage extraction pipeline combining image preprocessing, multilingual OCR, and vision-language models to improve structured information extraction from complex, multilingual financial documents in industrial KYC workflows.

Contribution

The proposed framework separates page localization from multimodal reasoning, significantly enhancing extraction accuracy on long, noisy, multilingual scanned documents compared to end-to-end models.

Findings

01

Pipeline outperforms direct PDF-to-VLM baselines by up to 31.9 percentage points.

02

Best configuration achieves 87.27% accuracy in field extraction.

03

Page-level retrieval is the key factor in performance gains.

Abstract

Structured information extraction from long, multilingual scanned financial documents is a core requirement in industrial KYC and compliance workflows. These documents are typically non machine readable, noisy, and visually heterogeneous. They usually span dozens of pages while containing only sparse task relevant information. Although recent vision-language models achieve strong benchmark performance, directly applying them end to end to full financial reports often leads to unreliable extraction under real world conditions. We present a multistage extraction framework that integrates image preprocessing, multilingual OCR, hybrid page-level retrieval, and compact VLM-based structured extraction. The design separates page localization from multimodal reasoning, enabling more accurate extraction from complex multipage documents. We evaluated the framework on 120 production KYC documents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.