TL;DR
CloudScan is an invoice analysis system that uses recurrent neural networks to accurately extract information from invoices without requiring templates or manual annotations, generalizing well to unseen layouts.
Contribution
It introduces a template-free, globally trained neural network model for invoice data extraction that learns from automatically generated feedback, eliminating manual annotation.
Findings
Recurrent neural network outperforms baseline in unseen invoice layouts.
Achieves high F1 scores on a large dataset of 326,471 invoices.
Generalizes well to new invoice formats without prior templates.
Abstract
We present CloudScan; an invoice analysis system that requires zero configuration or upfront annotation. In contrast to previous work, CloudScan does not rely on templates of invoice layout, instead it learns a single global model of invoices that naturally generalizes to unseen invoice layouts. The model is trained using data automatically extracted from end-user provided feedback. This automatic training data extraction removes the requirement for users to annotate the data precisely. We describe a recurrent neural network model that can capture long range context and compare it to a baseline logistic regression model corresponding to the current CloudScan production system. We train and evaluate the system on 8 important fields using a dataset of 326,471 invoices. The recurrent neural network and baseline model achieve 0.891 and 0.887 average F1 scores respectively on seen invoice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
