On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Jiayi Chen, Hanjun Dai, Bo Dai, Aidong Zhang, Wei Wei

TL;DR
This paper introduces a novel task-level few-shot learning framework for visually-rich document entity retrieval, emphasizing entity personalization and out-of-distribution handling, supported by a new dataset and improved meta-learning methods.
Contribution
It proposes a task-aware meta-learning framework with hierarchical decoding and contrastive learning for entity-level few-shot VDER, addressing personalization and OOD challenges.
Findings
Significant improvements over baseline meta-learning models.
Enhanced robustness in entity retrieval with few-shot learning.
Introduction of the FewVEX dataset for future research.
Abstract
Visually-rich document entity retrieval (VDER), which extracts key information (e.g. date, address) from document images like invoices and receipts, has become an important topic in industrial NLP applications. The emergence of new document types at a constant pace, each with its unique entity types, presents a unique challenge: many documents contain unseen entity types that occur only a couple of times. Addressing this challenge requires models to have the ability of learning entities in a few-shot manner. However, prior works for Few-shot VDER mainly address the problem at the document level with a predefined global entity space, which doesn't account for the entity-level few-shot scenario: target entity types are locally personalized by each task and entity occurrences vary significantly among documents. To address this unexplored scenario, this paper studies a novel entity-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques
MethodsContrastive Learning · Focus
