TL;DR
This paper introduces PDNet, a deep neural network that combines semantic segmentation with a primal-dual approach to improve document binarization, especially for degraded historical documents, achieving state-of-the-art results on multiple datasets.
Contribution
The novel integration of a fully convolutional network with an unrolled primal-dual network for end-to-end training in document binarization.
Findings
Achieves state-of-the-art binarization on four datasets
Pre-training on synthetic data improves performance
Handles numerical instabilities in primal-dual training
Abstract
Binarization of digital documents is the task of classifying each pixel in an image of the document as belonging to the background (parchment/paper) or foreground (text/ink). Historical documents are often subjected to degradations, that make the task challenging. In the current work a deep neural network architecture is proposed that combines a fully convolutional network with an unrolled primal-dual network that can be trained end-to-end to achieve state of the art binarization on four out of seven datasets. Document binarization is formulated as an energy minimization problem. A fully convolutional neural network is trained for semantic segmentation of pixels that provides labeling cost associated with each pixel. This cost estimate is refined along the edges to compensate for any over or under estimation of the foreground class using a primal-dual approach. We provide necessary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
