Unveiling Document Structures with YOLOv5 Layout Detection
Herman Sugiharto, Yorissa Silviana, Yani Siti Nurpazrin

TL;DR
This paper demonstrates that YOLOv5 can effectively identify and extract layout components from document images, significantly improving unstructured data processing in various sectors.
Contribution
It introduces a novel framework using YOLOv5 for document layout detection, achieving high accuracy and precision in recognizing document elements.
Findings
High accuracy (0.91) and recall (0.971) in layout detection
F1-score of 0.939 indicating balanced performance
AUC-ROC of 0.975 demonstrating excellent classification capability
Abstract
The current digital environment is characterized by the widespread presence of data, particularly unstructured data, which poses many issues in sectors including finance, healthcare, and education. Conventional techniques for data extraction encounter difficulties in dealing with the inherent variety and complexity of unstructured data, hence requiring the adoption of more efficient methodologies. This research investigates the utilization of YOLOv5, a cutting-edge computer vision model, for the purpose of rapidly identifying document layouts and extracting unstructured data. The present study establishes a conceptual framework for delineating the notion of "objects" as they pertain to documents, incorporating various elements such as paragraphs, tables, photos, and other constituent parts. The main objective is to create an autonomous system that can effectively recognize document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Currency Recognition and Detection · Vehicle License Plate Recognition
