PubLayNet: largest dataset ever for document layout analysis
Xu Zhong, Jianbin Tang, Antonio Jimeno Yepes

TL;DR
PubLayNet is a large-scale dataset of over 360,000 document images with annotated layout elements, enabling more effective training of neural networks for document layout analysis, especially in scientific articles.
Contribution
The paper introduces PubLayNet, the largest publicly available dataset for document layout analysis, facilitating improved deep learning models for scientific document understanding.
Findings
Deep neural networks trained on PubLayNet accurately recognize scientific article layouts.
Pre-trained models on PubLayNet are effective for transfer learning in different document domains.
The dataset enables development of more advanced document layout analysis models.
Abstract
Recognizing the layout of unstructured digital documents is an important step when parsing the documents into structured machine-readable format for downstream applications. Deep neural networks that are developed for computer vision have been proven to be an effective method to analyze layout of document images. However, document layout datasets that are currently publicly available are several magnitudes smaller than established computing vision datasets. Models have to be trained by transfer learning from a base model that is pre-trained on a traditional computer vision dataset. In this paper, we develop the PubLayNet dataset for document layout analysis by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. The size of the dataset is comparable to established computer vision datasets, containing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
