Text Classification Models for Form Entity Linking
Mar\'ia Villota, C\'esar Dom\'inguez, J\'onathan Heras, Eloy Mata, and, Vico Pascual

TL;DR
This paper introduces a novel approach combining image processing and BERT-based text classification for entity linking in scanned forms, achieving state-of-the-art results on the FUNSD dataset.
Contribution
It presents a new method that integrates image and text analysis to improve entity linking accuracy in diverse, scanned form documents.
Findings
Achieved an F1-score of 0.80 on FUNSD dataset.
Improved performance by 5% over previous best methods.
Demonstrated effectiveness of combining image and text techniques.
Abstract
Forms are a widespread type of template-based document used in a great variety of fields including, among others, administration, medicine, finance, or insurance. The automatic extraction of the information included in these documents is greatly demanded due to the increasing volume of forms that are generated in a daily basis. However, this is not a straightforward task when working with scanned forms because of the great diversity of templates with different location of form entities, and the quality of the scanned documents. In this context, there is a feature that is shared by all forms: they contain a collection of interlinked entities built as key-value (or label-value) pairs, together with other entities such as headers or images. In this work, we have tacked the problem of entity linking in forms by combining image processing techniques and a text classification model based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Image Processing and 3D Reconstruction
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Residual Connection · Layer Normalization · Attention Dropout · Linear Warmup With Linear Decay · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia?
