Lightweight Spatial Modeling for Combinatorial Information Extraction From Documents
Yanfei Dong, Lambert Deng, Jiazheng Zhang, Xiaodong Yu, Ting Lin,, Francesco Gelli, Soujanya Poria, Wee Sun Lee

TL;DR
This paper introduces KNN-former, a parameter-efficient spatial attention model leveraging local KNN graphs and combinatorial matching to improve document entity classification across diverse templates and languages.
Contribution
The paper proposes KNN-former, a novel spatial bias in attention for document understanding, and provides a new dataset to facilitate research on combinatorial document properties.
Findings
KNN-former outperforms baselines on multiple datasets.
The method is highly parameter-efficient.
New datasets support research on diverse document templates.
Abstract
Documents that consist of diverse templates and exhibit complex spatial structures pose a challenge for document entity classification. We propose KNN-former, which incorporates a new kind of spatial bias in attention calculation based on the K-nearest-neighbor (KNN) graph of document entities. We limit entities' attention only to their local radius defined by the KNN graph. We also use combinatorial matching to address the one-to-one mapping property that exists in many documents, where one field has only one corresponding entity. Moreover, our method is highly parameter-efficient compared to existing approaches in terms of the number of trainable parameters. Despite this, experiments across various datasets show our method outperforms baselines in most entity types. Many real-world documents exhibit combinatorial properties which can be leveraged as inductive biases to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
