LS-HDIB: A Large Scale Handwritten Document Image Binarization Dataset
Kaustubh Sadekar, Ashish Tiwari, Prajwal Singh, Shanmuganathan Raman

TL;DR
This paper introduces LS-HDIB, a large-scale dataset with over a million handwritten document images, designed to improve deep learning models for binarization by providing diverse real-world scenarios and accurate ground truths.
Contribution
The creation of LS-HDIB, the largest and most diverse handwritten document binarization dataset to date, along with a novel ground truth generation technique.
Findings
Models trained on LS-HDIB outperform on unseen data.
Dataset enhances deep learning model generalization.
Significant performance improvements observed across multiple models.
Abstract
Handwritten document image binarization is challenging due to high variability in the written content and complex background attributes such as page style, paper quality, stains, shadow gradients, and non-uniform illumination. While the traditional thresholding methods do not effectively generalize on such challenging real-world scenarios, deep learning-based methods have performed relatively well when provided with sufficient training data. However, the existing datasets are limited in size and diversity. This work proposes LS-HDIB - a large-scale handwritten document image binarization dataset containing over a million document images that span numerous real-world scenarios. Additionally, we introduce a novel technique that uses a combination of adaptive thresholding and seamless cloning methods to create the dataset with accurate ground truths. Through an extensive quantitative and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Vehicle License Plate Recognition
