Framework and Model Analysis on Bengali Document Layout Analysis Dataset: BaDLAD
Kazi Reyazul Hasan (1), Mubasshira Musarrat (1), Sadif Ahmed (1) and, Shahriar Raj (1) ((1) Bangladesh University of Engineering, Technology)

TL;DR
This paper compares the effectiveness of Detectron2, YOLOv8, and SAM in analyzing Bengali document layouts, providing insights into their accuracy and speed for different layout components.
Contribution
It introduces a comprehensive analysis of multiple computer vision models applied to Bengali document layout understanding, highlighting their strengths and limitations.
Findings
Detectron2 excels at segmenting document parts
YOLOv8 effectively identifies tables and images
SAM aids in understanding complex layouts
Abstract
This study focuses on understanding Bengali Document Layouts using advanced computer programs: Detectron2, YOLOv8, and SAM. We looked at lots of different Bengali documents in our study. Detectron2 is great at finding and separating different parts of documents, like text boxes and paragraphs. YOLOv8 is good at figuring out different tables and pictures. We also tried SAM, which helps us understand tricky layouts. We tested these programs to see how well they work. By comparing their accuracy and speed, we learned which one is good for different types of documents. Our research helps make sense of complex layouts in Bengali documents and can be useful for other languages too.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCurrency Recognition and Detection · Handwritten Text Recognition Techniques · Vehicle License Plate Recognition
MethodsYou Only Look Once · Segment Anything Model
