BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset
Md. Istiak Hossain Shihab, Md. Rakibul Hasan, Mahfuzur Rahman Emon,, Syed Mobassir Hossen, Md. Nazmuddoha Ansary, Intesur Ahmed, Fazle Rabbi, Rakib, Shahriar Elahi Dhruvo, Souhardya Saha Dip, Akib Hasan Pavel, Marsia, Haque Meghla, Md. Rezwanul Haque, Sayma Sultana Chowdhury

TL;DR
BaDLAD is the first large, multi-domain Bengali document layout analysis dataset, enabling improved deep learning models for Bengali OCR and document transcription, especially for historical and domain-specific documents.
Contribution
This paper introduces BaDLAD, the first large-scale multi-domain Bengali DLA dataset with extensive annotations, facilitating research in Bengali document digitization.
Findings
Existing deep learning models perform well on BaDLAD benchmarks.
BaDLAD enables effective training of Bengali OCR models.
The dataset covers diverse document types and domains.
Abstract
While strides have been made in deep learning based Bengali Optical Character Recognition (OCR) in the past decade, the absence of large Document Layout Analysis (DLA) datasets has hindered the application of OCR in document transcription, e.g., transcribing historical documents and newspapers. Moreover, rule-based DLA systems that are currently being employed in practice are not robust to domain variations and out-of-distribution layouts. To this end, we present the first multidomain large Bengali Document Layout Analysis Dataset: BaDLAD. This dataset contains 33,695 human annotated document samples from six domains - i) books and magazines, ii) public domain govt. documents, iii) liberation war documents, iv) newspapers, v) historical newspapers, and vi) property deeds, with 710K polygon annotations for four unit types: text-box, paragraph, image, and table. Through preliminary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Digital Media Forensic Detection · Image Processing and 3D Reconstruction
MethodsDeep Layer Aggregation
