Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Gaye Colakoglu, G\"urkan Solmaz, Jonathan F\"urst

TL;DR
This paper explores how to effectively use large language models for extracting information from layout-rich documents, proposing a new benchmark and methods to optimize their performance without fine-tuning.
Contribution
It introduces LayIE-LLM, an open-source test suite for layout-aware IE, and develops a simple OFAT method to optimize LLM configurations efficiently.
Findings
LLMs require specific pipeline adjustments for layout-rich IE
Optimized configurations significantly outperform baseline setups
Near-optimal results achieved with low computational cost
Abstract
This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study investigates the sub-problems and methods within these core challenges, such as input representation, chunking, prompting, selection of LLMs, and multimodal models. It examines the effect of different design choices through LayIE-LLM, a new, open-source, layout-aware IE test suite, benchmarking against traditional, fine-tuned IE models. The results on two IE datasets show that LLMs require adjustment of the IE pipeline to achieve competitive performance: the optimized configuration found with LayIE-LLM achieves 13.3--37.5 F1 points more than a general-practice baseline configuration using the same LLM. To find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies
