LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou

TL;DR
LayoutLM introduces a novel pre-training framework that jointly models text, layout, and visual information in scanned document images, significantly improving performance on various document understanding tasks.
Contribution
It is the first framework to jointly learn text, layout, and visual features for document image understanding, achieving state-of-the-art results.
Findings
Improved form understanding accuracy from 70.72 to 79.27
Enhanced receipt understanding from 94.02 to 95.24
Boosted document classification accuracy from 93.07 to 94.42
Abstract
Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pre-training models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words' visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training. It achieves new state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗atahmasb/tf-layoutlm-base-uncasedmodel· 3 dl3 dl
- 🤗atahmasb/tf-layoutlm-large-uncasedmodel· 1 dl1 dl
- 🤗microsoft/layoutlm-base-casedmodel· 12k dl· ♡ 1912k dl♡ 19
- 🤗microsoft/layoutlm-base-uncasedmodel· 132k dl· ♡ 61132k dl♡ 61
- 🤗microsoft/layoutlm-large-uncasedmodel· 8.6k dl· ♡ 108.6k dl♡ 10
- 🤗impira/layoutlm-document-classifiermodel· 91 dl· ♡ 1491 dl♡ 14
- 🤗gurvgupta/LayoutLM_rvl-cdipmodel· ♡ 1♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
