Notes on Applicability of GPT-4 to Document Understanding

{\L}ukasz Borchmann

arXiv:2405.18433·cs.CL·May 29, 2024·1 cites

Notes on Applicability of GPT-4 to Document Understanding

{\L}ukasz Borchmann

PDF

Open Access

TL;DR

This paper evaluates GPT-4 models on document understanding tasks, highlighting their strengths with multimodal inputs and identifying limitations such as performance drops on lengthy documents and potential model contamination.

Contribution

It provides a reproducible benchmark of GPT-4 models for document understanding, emphasizing the importance of multimodal inputs and analyzing model limitations.

Findings

01

GPT-4 Vision Turbo performs well with OCR and images.

02

Text-only GPT-4 models face challenges in document comprehension.

03

Performance drops significantly on lengthy documents.

Abstract

We perform a missing, reproducible evaluation of all publicly available GPT-4 family models concerning the Document Understanding field, where it is frequently required to comprehend text spacial arrangement and visual clues in addition to textual semantics. Benchmark results indicate that though it is hard to achieve satisfactory results with text-only models, GPT-4 Vision Turbo performs well when one provides both text recognized by an external OCR engine and document images on the input. Evaluation is followed by analyses that suggest possible contamination of textual GPT-4 models and indicate the significant performance drop for lengthy documents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections