UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents
Yi Tu, Chong Zhang, Ya Guo, Huan Chen, Jinyang Tang, Huijia Zhu, Qi, Zhang

TL;DR
This paper introduces UNER, a unified prediction head for VrD-NER that improves entity recognition by addressing complex layouts, reading order errors, and task formulation issues, leveraging pre-training and universal layout understanding.
Contribution
The paper presents UNER, a novel query-aware entity extraction head that enhances multi-modal document transformers for VrD-NER, with effective pre-training and cross-linguistic capabilities.
Findings
UNER improves entity extraction accuracy across datasets.
Pre-training with UNER enhances model performance in few-shot scenarios.
UNER enables zero-shot entity extraction and cross-linguistic transfer.
Abstract
The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role in various real-world scenarios and applications. However, the research in VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, and unsuitable task formulations. To address these challenges, we propose a query-aware entity extraction head, namely UNER, to collaborate with existing multi-modal document transformers to develop more robust VrD-NER models. The UNER head considers the VrD-NER task as a combination of sequence labeling and reading order prediction, effectively addressing the issues of discontinuous entities in documents. Experimental evaluations on diverse datasets demonstrate the effectiveness of UNER in improving entity extraction performance. Moreover, the UNER head enables a supervised pre-training stage on various VrD-NER datasets to enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis
