A Hybrid Architecture for Multi-Stage Claim Document Understanding: Combining Vision-Language Models and Machine Learning for Real-Time Processing

Lilu Cheng; Jingjun Lu; Yi Xuan Chan; Quoc Khai Nguyen; John Bi; Sean Ho

arXiv:2601.01897·cs.IR·January 6, 2026

A Hybrid Architecture for Multi-Stage Claim Document Understanding: Combining Vision-Language Models and Machine Learning for Real-Time Processing

Lilu Cheng, Jingjun Lu, Yi Xuan Chan, Quoc Khai Nguyen, John Bi, Sean Ho

PDF

Open Access

TL;DR

This paper introduces a multi-stage system combining OCR, traditional ML, and vision-language models to automate claim document processing, achieving high accuracy and speed in real-world healthcare and insurance applications.

Contribution

The paper presents a novel hybrid architecture integrating OCR, logistic regression, and a vision-language model for efficient multi-language claim document understanding.

Findings

01

Over 95% document classification accuracy

02

Approximately 87% field extraction accuracy

03

Under 2 seconds processing time per document

Abstract

Claims documents are fundamental to healthcare and insurance operations, serving as the basis for reimbursement, auditing, and compliance. However, these documents are typically not born digital; they often exist as scanned PDFs or photographs captured under uncontrolled conditions. Consequently, they exhibit significant content heterogeneity, ranging from typed invoices to handwritten medical reports, as well as linguistic diversity. This challenge is exemplified by operations at Fullerton Health, which handles tens of millions of claims annually across nine markets, including Singapore, the Philippines, Indonesia, Malaysia, Mainland China, Hong Kong, Vietnam, Papua New Guinea, and Cambodia. Such variability, coupled with inconsistent image quality and diverse layouts, poses a significant obstacle to automated parsing and structured information extraction. This paper presents a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Text and Document Classification Technologies