Page Stream Segmentation with Convolutional Neural Nets Combining Textual and Visual Features
Gregor Wiedemann, Gerhard Heyer

TL;DR
This paper presents a novel CNN-based method combining visual and textual features for page stream segmentation, achieving state-of-the-art accuracy in separating scanned document streams into individual documents.
Contribution
Introduces a new CNN architecture that integrates image and text features for improved page stream segmentation accuracy.
Findings
Achieves up to 93% accuracy in page stream segmentation.
Outperforms previous methods, setting a new state-of-the-art.
Effective combination of visual and textual features enhances segmentation results.
Abstract
In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As a first step, the workflow involves scanning and Optical Character Recognition (OCR) of documents. Preservation of document contexts of single page scans is a major requirement in this context. To facilitate workflows involving very large amounts of paper scans, page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into multi-page documents. In a digitization project together with a German federal archive, we developed a novel approach based on convolutional neural networks (CNN) combining image and text features to achieve optimal document separation results. Evaluation shows that our PSS architecture achieves an accuracy up to 93 % which can be regarded as a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Handwritten Text Recognition Techniques · Image Retrieval and Classification Techniques
