TL;DR
This paper enhances transformer-based models for business document information extraction by introducing two specialized pre-training tasks focused on layout understanding and numeric values, leading to improved extraction accuracy.
Contribution
It proposes two novel pre-training tasks for LayoutLM that better capture document layout and numeric information, and introduces a new decoding algorithm for complex entities.
Findings
Significant F1 score improvements on public datasets (from 93.88 to 95.50)
Moderate F1 score improvements on private datasets (from 84.35 to 84.84)
Enhanced understanding of complex document structures
Abstract
Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training tasks proposed in the literature for business documents are too generic and not sufficient to learn more complex structures. In this paper, we use LayoutLM, a language model pre-trained on a collection of business documents, and introduce two new pre-training tasks that further improve its capacity to extract relevant information. The first is aimed at better understanding the complex layout of documents, and the second focuses on numeric values and their order of magnitude. These tasks force the model to learn better-contextualized representations of the scanned documents. We further introduce a new post-processing algorithm to decode BIESO tags in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
