Zero-shot Task Transfer for Invoice Extraction via Class-aware QA Ensemble
Prithiviraj Damodaran, Prabhkaran Singh, Josemon Achankuju

TL;DR
VESPA is a simple zero-shot document extraction system that transfers invoice extraction to a QA task, outperforming specialized commercial solutions without requiring labeled training data.
Contribution
The paper introduces VESPA, a novel zero-shot approach that leverages QA for invoice extraction, eliminating the need for task-specific architectures or labeled datasets.
Findings
Outperforms 4 commercial invoice solutions
Achieves an average F1 score of 87.50 for 6 fields
Works across diverse layouts, domains, and geographies
Abstract
We present VESPA, an intentionally simple yet novel zero-shot system for layout, locale, and domain agnostic document extraction. In spite of the availability of large corpora of documents, the lack of labeled and validated datasets makes it a challenge to discriminatively train document extraction models for enterprises. We show that this problem can be addressed by simply transferring the information extraction (IE) task to a natural language Question-Answering (QA) task without engineering task-specific architectures. We demonstrate the effectiveness of our system by evaluating on a closed corpus of real-world retail and tax invoices with multiple complex layouts, domains, and geographies. The empirical evaluation shows that our system outperforms 4 prominent commercial invoice solutions that use discriminatively trained models with architectures specifically crafted for invoice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Handwritten Text Recognition Techniques · Natural Language Processing Techniques
