ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents
Christoph Auer, Ahmed Nassar, Maksym Lysak, Michele Dolfi, Nikolaos, Livathinos, Peter Staar

TL;DR
This paper reports on the ICDAR 2023 competition focused on robust layout segmentation in diverse corporate documents, highlighting advances in vision-transformer models and ensemble strategies that improve accuracy and generalization.
Contribution
It introduces a challenging new dataset and benchmark for document layout segmentation, and showcases innovative solutions leveraging recent computer vision techniques.
Findings
Vision-transformer based methods are increasingly adopted.
Ensemble strategies improve segmentation accuracy.
Progress towards robust, generalizable document layout understanding.
Abstract
Transforming documents into machine-processable representations is a challenging task due to their complex structures and variability in formats. Recovering the layout structure and content from PDF files or scanned material has remained a key problem for decades. ICDAR has a long tradition in hosting competitions to benchmark the state-of-the-art and encourage the development of novel solutions to document layout understanding. In this report, we present the results of our \textit{ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents}, which posed the challenge to accurately segment the page layout in a broad range of document styles and domains, including corporate reports, technical literature and patents. To raise the bar over previous competitions, we engineered a hard competition dataset and proposed the recent DocLayNet dataset for training. We recorded 45…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques
