Lightweight Spatial Modeling for Combinatorial Information Extraction   From Documents

Yanfei Dong; Lambert Deng; Jiazheng Zhang; Xiaodong Yu; Ting Lin,; Francesco Gelli; Soujanya Poria; Wee Sun Lee

arXiv:2405.06701·cs.CL·May 14, 2024

Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents

Yanfei Dong, Lambert Deng, Jiazheng Zhang, Xiaodong Yu, Ting Lin,, Francesco Gelli, Soujanya Poria, Wee Sun Lee

PDF

TL;DR

This paper introduces KNN-former, a parameter-efficient spatial attention model leveraging local KNN graphs and combinatorial matching to improve document entity classification across diverse templates and languages.

Contribution

The paper proposes KNN-former, a novel spatial bias in attention for document understanding, and provides a new dataset to facilitate research on combinatorial document properties.

Findings

01

KNN-former outperforms baselines on multiple datasets.

02

The method is highly parameter-efficient.

03

New datasets support research on diverse document templates.

Abstract

Documents that consist of diverse templates and exhibit complex spatial structures pose a challenge for document entity classification. We propose KNN-former, which incorporates a new kind of spatial bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We limit entities' attention only to their local radius defined by the KNN graph. We also use combinatorial matching to address the one-to-one mapping property that exists in many documents, where one field has only one corresponding entity. Moreover, our method is highly parameter-efficient compared to existing approaches in terms of the number of trainable parameters. Despite this, experiments across various datasets show our method outperforms baselines in most entity types. Many real-world documents exhibit combinatorial properties which can be leveraged as inductive biases to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.