LMDX: Language Model-based Document Information Extraction and Localization
Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun,, Ramya Sree Boppana, Zilong Wang, Zifeng Wang, Jiaqi Mu, Hao Zhang, Chen-Yu, Lee, Nan Hua

TL;DR
LMDX introduces a novel approach leveraging large language models for extracting and localizing key information from visually rich, semi-structured documents, achieving state-of-the-art results and enabling data-efficient parsing.
Contribution
The paper presents LMDX, a new methodology that incorporates layout encoding and grounding mechanisms into LLMs for document information extraction and localization.
Findings
LMDX achieves new state-of-the-art on VRDU and CORD benchmarks.
LMDX enables extraction of hierarchical and repeated entities.
LMDX works with minimal or no training data.
Abstract
Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art and exhibiting emergent capabilities across various tasks. However, their application in extracting information from visually rich documents, which is at the core of many document processing workflows and involving the extraction of key entities from semi-structured documents, has not yet been successful. The main obstacles to adopting LLMs for this task include the absence of layout encoding within LLMs, which is critical for high quality extraction, and the lack of a grounding mechanism to localize the predicted entities within the document. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to reframe the document information extraction task for a LLM. LMDX enables extraction of singular, repeated, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
MethodsPathways Language Model
