Robustness of Structured Data Extraction from In-plane Rotated Documents using Multi-Modal Large Language Models (LLM)
Anjanava Biswas, Wrick Talukdar

TL;DR
This paper examines how in-plane document rotation affects data extraction accuracy of multi-modal LLMs, identifies safe rotation angles, and proposes methods to improve robustness against skew in real-world document processing.
Contribution
It provides a comprehensive analysis of skew impact on multi-modal LLMs and introduces new approaches to enhance their robustness to document rotation.
Findings
Skew significantly reduces extraction accuracy across models
Safe in-plane rotation angles vary per model
Proposed architectures improve skew robustness
Abstract
Multi-modal large language models (LLMs) have shown remarkable performance in various natural language processing tasks, including data extraction from documents. However, the accuracy of these models can be significantly affected by document in-plane rotation, also known as skew, a common issue in real-world scenarios for scanned documents. This study investigates the impact of document skew on the data extraction accuracy of three state-of-the-art multi-modal LLMs: Anthropic Claude V3 Sonnet, GPT-4-Turbo, and Llava:v1.6. We focus on extracting specific entities from synthetically generated sample documents with varying degrees of skewness. The results demonstrate that document skew adversely affects the data extraction accuracy of all the tested LLMs, with the severity of the impact varying across models. We identify the safe in-plane rotation angles (SIPRA) for each model and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsFocus
