Towards Few-shot Entity Recognition in Document Images: A Label-aware Sequence-to-Sequence Framework
Zilong Wang, Jingbo Shang

TL;DR
This paper introduces LASER, a label-aware sequence-to-sequence framework for few-shot entity recognition in document images, leveraging label semantics and spatial information to improve recognition with limited data.
Contribution
The paper proposes a novel label-aware seq2seq model that explicitly generates label names and embeds labels into spatial space, enhancing few-shot entity recognition in document images.
Findings
LASER outperforms existing methods in few-shot settings.
The model effectively captures semantic and spatial correlations.
Extensive experiments validate LASER's superiority on benchmark datasets.
Abstract
Entity recognition is a fundamental task in understanding document images. Traditional sequence labeling frameworks treat the entity types as class IDs and rely on extensive data and high-quality annotations to learn semantics which are typically expensive in practice. In this paper, we aim to build an entity recognition model requiring only a few shots of annotated document images. To overcome the data limitation, we propose to leverage the label surface names to better inform the model of the target entity type semantics and also embed the labels into the spatial embedding space to capture the spatial correspondence between regions and labels. Specifically, we go beyond sequence labeling and develop a novel label-aware seq2seq framework, LASER. The proposed model follows a new labeling scheme that generates the label surface names word-by-word explicitly after generating the entities.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence
