Robustness of Structured Data Extraction from Perspectively Distorted Documents
Hyakka Nakada, Yoshiyasu Tanaka

TL;DR
This paper investigates how perspectively distorted documents affect data extraction accuracy of multi-modal LLMs, revealing that distortions significantly degrade performance but can be mitigated by simple rotational correction.
Contribution
It introduces a simplified model of perspective distortion using two parameters and evaluates its impact on LLM-based OCR, highlighting the importance of correction methods.
Findings
Structure-recognition accuracy is significantly affected by perspective distortions.
Rotational correction improves data extraction accuracy.
Distortion modeling with two parameters effectively captures real-world distortions.
Abstract
Optical Character Recognition (OCR) for data extraction from documents is essential to intelligent informatics, such as digitizing medical records and recognizing road signs. Multi-modal Large Language Models (LLMs) can solve this task and have shown remarkable performance. Recently, it has been noticed that the accuracy of data extraction by multi-modal LLMs can be affected when in-plane rotations are present in the documents. However, real-world document images are usually not only in-plane rotated but also perspectively distorted. This study investigates the impacts of such perturbations on the data extraction accuracy for the state-of-the-art model, Gemini-1.5-pro. Because perspective distortions have a high degree of freedom, designing experiments in the same manner as single-parametric rotations is difficult. We observed typical distortions of document images and showed that most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image and Object Detection Techniques
