Framework and Model Analysis on Bengali Document Layout Analysis   Dataset: BaDLAD

Kazi Reyazul Hasan (1); Mubasshira Musarrat (1); Sadif Ahmed (1) and; Shahriar Raj (1) ((1) Bangladesh University of Engineering; Technology)

arXiv:2309.16700·cs.CV·October 2, 2023

Framework and Model Analysis on Bengali Document Layout Analysis Dataset: BaDLAD

Kazi Reyazul Hasan (1), Mubasshira Musarrat (1), Sadif Ahmed (1) and, Shahriar Raj (1) ((1) Bangladesh University of Engineering, Technology)

PDF

Open Access

TL;DR

This paper compares the effectiveness of Detectron2, YOLOv8, and SAM in analyzing Bengali document layouts, providing insights into their accuracy and speed for different layout components.

Contribution

It introduces a comprehensive analysis of multiple computer vision models applied to Bengali document layout understanding, highlighting their strengths and limitations.

Findings

01

Detectron2 excels at segmenting document parts

02

YOLOv8 effectively identifies tables and images

03

SAM aids in understanding complex layouts

Abstract

This study focuses on understanding Bengali Document Layouts using advanced computer programs: Detectron2, YOLOv8, and SAM. We looked at lots of different Bengali documents in our study. Detectron2 is great at finding and separating different parts of documents, like text boxes and paragraphs. YOLOv8 is good at figuring out different tables and pictures. We also tried SAM, which helps us understand tricky layouts. We tested these programs to see how well they work. By comparing their accuracy and speed, we learned which one is good for different types of documents. Our research helps make sense of complex layouts in Bengali documents and can be useful for other languages too.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCurrency Recognition and Detection · Handwritten Text Recognition Techniques · Vehicle License Plate Recognition

MethodsYou Only Look Once · Segment Anything Model