Combining Deep Learning and Reasoning for Address Detection in Unstructured Text Documents
Matthias Engelbach, Dennis Klau, Jens Drawehn, Maximilien Kintz

TL;DR
This paper presents a hybrid method combining deep learning and rule-based reasoning to detect and extract addresses from unstructured, multi-layout text documents, improving automation in document processing.
Contribution
It introduces a novel hybrid approach that integrates visual deep learning detection with domain knowledge reasoning for address extraction in complex documents.
Findings
Effective detection of address regions in scanned documents
Improved accuracy over purely visual or text-based methods
Applicable to multi-column and nested table layouts
Abstract
Extracting information from unstructured text documents is a demanding task, since these documents can have a broad variety of different layouts and a non-trivial reading order, like it is the case for multi-column documents or nested tables. Additionally, many business documents are received in paper form, meaning that the textual contents need to be digitized before further analysis. Nonetheless, automatic detection and capturing of crucial document information like the sender address would boost many companies' processing efficiency. In this work we propose a hybrid approach that combines deep learning with reasoning for finding and extracting addresses from unstructured text documents. We use a visual deep learning model to detect the boundaries of possible address regions on the scanned document images and validate these results by analyzing the containing text using domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Text and Document Classification Technologies · Handwritten Text Recognition Techniques
