ICDAR 2023 Competition on Robust Layout Segmentation in Corporate   Documents

Christoph Auer; Ahmed Nassar; Maksym Lysak; Michele Dolfi; Nikolaos; Livathinos; Peter Staar

arXiv:2305.14962·cs.CV·August 22, 2023·1 cites

ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

Christoph Auer, Ahmed Nassar, Maksym Lysak, Michele Dolfi, Nikolaos, Livathinos, Peter Staar

PDF

Open Access

TL;DR

This paper reports on the ICDAR 2023 competition focused on robust layout segmentation in diverse corporate documents, highlighting advances in vision-transformer models and ensemble strategies that improve accuracy and generalization.

Contribution

It introduces a challenging new dataset and benchmark for document layout segmentation, and showcases innovative solutions leveraging recent computer vision techniques.

Findings

01

Vision-transformer based methods are increasingly adopted.

02

Ensemble strategies improve segmentation accuracy.

03

Progress towards robust, generalizable document layout understanding.

Abstract

Transforming documents into machine-processable representations is a challenging task due to their complex structures and variability in formats. Recovering the layout structure and content from PDF files or scanned material has remained a key problem for decades. ICDAR has a long tradition in hosting competitions to benchmark the state-of-the-art and encourage the development of novel solutions to document layout understanding. In this report, we present the results of our \textit{ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents}, which posed the challenge to accurately segment the page layout in a broad range of document styles and domains, including corporate reports, technical literature and patents. To raise the bar over previous competitions, we engineered a hard competition dataset and proposed the recent DocLayNet dataset for training. We recorded 45…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image Retrieval and Classification Techniques