CUTIE: Learning to Understand Documents with Convolutional Universal   Text Information Extractor

Xiaohui Zhao; Endi Niu; Zhuo Wu; and Xiaoguang Wang

arXiv:1903.12363·cs.CV·June 21, 2019·46 cites

CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor

Xiaohui Zhao, Endi Niu, Zhuo Wu, and Xiaoguang Wang

PDF

Open Access 4 Repos

TL;DR

This paper introduces CUTIE, a convolutional neural network model that effectively extracts structured information from documents by leveraging both semantic and spatial text features, outperforming previous NER-based methods.

Contribution

The paper proposes a novel CNN-based model, CUTIE, that captures both semantic and spatial information from document texts, achieving state-of-the-art results without pre-training or post-processing.

Findings

01

Achieves superior accuracy and speed over NER-based methods.

02

Performs well with limited training data.

03

Does not require pre-training or post-processing.

Abstract

Extracting key information from documents, such as receipts or invoices, and preserving the interested texts to structured data is crucial in the document-intensive streamline processes of office automation in areas that includes but not limited to accounting, financial, and taxation areas. To avoid designing expert rules for each specific type of document, some published works attempt to tackle the problem by learning a model to explore the semantic context in text sequences based on the Named Entity Recognition (NER) method in the NLP field. In this paper, we propose to harness the effective information from both semantic meaning and spatial distribution of texts in documents. Specifically, our proposed model, Convolutional Universal Text Information Extractor (CUTIE), applies convolutional neural networks on gridded texts where texts are embedded as features with semantical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Text and Document Classification Technologies

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings