ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from   Unstructured Financial Documents

Furkan Pala; Mehmet Yasin Akp{\i}nar; Onur Deniz; G\"ul\c{s}en; Eryi\u{g}it

arXiv:2409.15004·cs.AI·September 24, 2024

ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents

Furkan Pala, Mehmet Yasin Akp{\i}nar, Onur Deniz, G\"ul\c{s}en, Eryi\u{g}it

PDF

Open Access

TL;DR

This paper introduces ViBERTgrid BiLSTM-CRF, a multimodal model that significantly improves key information extraction from unstructured financial documents, extending its effectiveness from semi-structured to unstructured formats.

Contribution

It adapts the ViBERTgrid transformer with a BiLSTM-CRF layer for unstructured documents and releases new token-level annotations for the SROIE dataset.

Findings

01

Up to 2% performance improvement in named entity recognition

02

Maintains performance on semi-structured documents

03

Public release of token-level annotations for SROIE

Abstract

Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2 percentage points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques