On Task-personalized Multimodal Few-shot Learning for Visually-rich   Document Entity Retrieval

Jiayi Chen; Hanjun Dai; Bo Dai; Aidong Zhang; Wei Wei

arXiv:2311.00693·cs.AI·December 12, 2023·1 cites

On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval

Jiayi Chen, Hanjun Dai, Bo Dai, Aidong Zhang, Wei Wei

PDF

Open Access

TL;DR

This paper introduces a novel task-level few-shot learning framework for visually-rich document entity retrieval, emphasizing entity personalization and out-of-distribution handling, supported by a new dataset and improved meta-learning methods.

Contribution

It proposes a task-aware meta-learning framework with hierarchical decoding and contrastive learning for entity-level few-shot VDER, addressing personalization and OOD challenges.

Findings

01

Significant improvements over baseline meta-learning models.

02

Enhanced robustness in entity retrieval with few-shot learning.

03

Introduction of the FewVEX dataset for future research.

Abstract

Visually-rich document entity retrieval (VDER), which extracts key information (e.g. date, address) from document images like invoices and receipts, has become an important topic in industrial NLP applications. The emergence of new document types at a constant pace, each with its unique entity types, presents a unique challenge: many documents contain unseen entity types that occur only a couple of times. Addressing this challenge requires models to have the ability of learning entities in a few-shot manner. However, prior works for Few-shot VDER mainly address the problem at the document level with a predefined global entity space, which doesn't account for the entity-level few-shot scenario: target entity types are locally personalized by each task and entity occurrences vary significantly among documents. To address this unexplored scenario, this paper studies a novel entity-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsContrastive Learning · Focus