CUTIE: Learning to Understand Documents with Convolutional Universal Text Information Extractor
Xiaohui Zhao, Endi Niu, Zhuo Wu, and Xiaoguang Wang

TL;DR
This paper introduces CUTIE, a convolutional neural network model that effectively extracts structured information from documents by leveraging both semantic and spatial text features, outperforming previous NER-based methods.
Contribution
The paper proposes a novel CNN-based model, CUTIE, that captures both semantic and spatial information from document texts, achieving state-of-the-art results without pre-training or post-processing.
Findings
Achieves superior accuracy and speed over NER-based methods.
Performs well with limited training data.
Does not require pre-training or post-processing.
Abstract
Extracting key information from documents, such as receipts or invoices, and preserving the interested texts to structured data is crucial in the document-intensive streamline processes of office automation in areas that includes but not limited to accounting, financial, and taxation areas. To avoid designing expert rules for each specific type of document, some published works attempt to tackle the problem by learning a model to explore the semantic context in text sequences based on the Named Entity Recognition (NER) method in the NLP field. In this paper, we propose to harness the effective information from both semantic meaning and spatial distribution of texts in documents. Specifically, our proposed model, Convolutional Universal Text Information Extractor (CUTIE), applies convolutional neural networks on gridded texts where texts are embedded as features with semantical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Text and Document Classification Technologies
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
