A Hybrid Architecture for Multi-Stage Claim Document Understanding: Combining Vision-Language Models and Machine Learning for Real-Time Processing
Lilu Cheng, Jingjun Lu, Yi Xuan Chan, Quoc Khai Nguyen, John Bi, Sean Ho

TL;DR
This paper introduces a multi-stage system combining OCR, traditional ML, and vision-language models to automate claim document processing, achieving high accuracy and speed in real-world healthcare and insurance applications.
Contribution
The paper presents a novel hybrid architecture integrating OCR, logistic regression, and a vision-language model for efficient multi-language claim document understanding.
Findings
Over 95% document classification accuracy
Approximately 87% field extraction accuracy
Under 2 seconds processing time per document
Abstract
Claims documents are fundamental to healthcare and insurance operations, serving as the basis for reimbursement, auditing, and compliance. However, these documents are typically not born digital; they often exist as scanned PDFs or photographs captured under uncontrolled conditions. Consequently, they exhibit significant content heterogeneity, ranging from typed invoices to handwritten medical reports, as well as linguistic diversity. This challenge is exemplified by operations at Fullerton Health, which handles tens of millions of claims annually across nine markets, including Singapore, the Philippines, Indonesia, Malaysia, Mainland China, Hong Kong, Vietnam, Papua New Guinea, and Cambodia. Such variability, coupled with inconsistent image quality and diverse layouts, poses a significant obstacle to automated parsing and structured information extraction. This paper presents a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Text and Document Classification Technologies
