Patchfinder: Leveraging Visual Language Models for Accurate Information Retrieval using Model Uncertainty
Roman Colman, Minh Vu, Manish Bhattarai, Martin Ma, Hari Viswanathan,, Daniel O'Malley, Javier E. Santos

TL;DR
PatchFinder is a novel algorithm that enhances information retrieval from noisy scanned documents by leveraging vision language models and a confidence-based patching approach, significantly improving accuracy over existing methods.
Contribution
It introduces a confidence-based patching method for VLMs to improve information extraction from noisy documents, reducing reliance on expensive language models.
Findings
Achieves 94% accuracy on noisy scanned documents.
Outperforms ChatGPT-4o by 18.5 percentage points.
Uses a confidence metric to optimize patch size.
Abstract
For decades, corporations and governments have relied on scanned documents to record vast amounts of information. However, extracting this information is a slow and tedious process due to the sheer volume and complexity of these records. The rise of Vision Language Models (VLMs) presents a way to efficiently and accurately extract the information out of these documents. The current automated workflow often requires a two-step approach involving the extraction of information using optical character recognition software and subsequent usage of large language models for processing this information. Unfortunately, these methods encounter significant challenges when dealing with noisy scanned documents, often requiring computationally expensive language models to handle high information density effectively. In this study, we propose PatchFinder, an algorithm that builds upon VLMs to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling
MethodsSoftmax
