Bengali Document Layout Analysis with Detectron2
Md Ataullha, Mahedi Hassan Rabby, Mushfiqur Rahman, Tahsina, Bintay Azam

TL;DR
This paper enhances Bengali document layout analysis by applying advanced Mask R-CNN models in Detectron2, demonstrating improved segmentation accuracy on the BaDLAD dataset, and discussing the impact of pretrained weights and model variants.
Contribution
The study introduces a novel application of Mask R-CNN models for Bengali DLA, utilizing the BaDLAD dataset and evaluating pretrained versus non-pretrained models for improved segmentation.
Findings
Mask R-CNN models effectively segment Bengali documents.
Pretrained weights significantly improve model accuracy.
Tradeoffs between speed and accuracy are discussed.
Abstract
Document digitization is vital for preserving historical records, efficient document management, and advancing OCR (Optical Character Recognition) research. Document Layout Analysis (DLA) involves segmenting documents into meaningful units like text boxes, paragraphs, images, and tables. Challenges arise when dealing with diverse layouts, historical documents, and unique scripts like Bengali, hindered by the lack of comprehensive Bengali DLA datasets. We improved the accuracy of the DLA model for Bengali documents by utilizing advanced Mask R-CNN models available in the Detectron2 library. Our evaluation involved three variants: Mask R-CNN R-50, R-101, and X-101, both with and without pretrained weights from PubLayNet, on the BaDLAD dataset, which contains human-annotated Bengali documents in four categories: text boxes, paragraphs, images, and tables. Results show the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Digital Imaging for Blood Diseases · Image Processing and 3D Reconstruction
MethodsSoftmax · RoIAlign · Convolution · Region Proposal Network · Mask R-CNN · Deep Layer Aggregation
