QueryForm: A Simple Zero-shot Form Entity Query Framework
Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao, Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister

TL;DR
QueryForm is a novel zero-shot framework for extracting entities from form-like documents using a query-based approach with a dual prompting mechanism, achieving state-of-the-art results without target-specific training.
Contribution
It introduces a unified query-based framework with pre-training on large-scale weakly annotated data, enabling effective zero-shot document entity extraction across diverse layouts.
Findings
Achieves +4.6% to +10.1% F1 on XFUND benchmark
Achieves +3.2% to +9.5% F1 on Payment benchmark
Operates with smaller models and no image input
Abstract
Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific entity type into a query, which is used to prompt a Transformer model to perform a single entity extraction task. Furthermore, we propose to leverage large-scale query-entity pairs generated from form-like webpages with weak HTML annotations to pre-train QueryForm. By unifying pre-training and fine-tuning into the same query-based framework, QueryForm enables models to learn from structured documents containing various entities and layouts, leading to better generalization to target document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Handwritten Text Recognition Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Layer Normalization · Softmax · Adam · Absolute Position Encodings
