Document Domain Randomization for Deep Learning Document Layout Extraction
Meng Ling, Jian Chen, Torsten M\"oller, Petra Isenberg and, Tobias Isenberg, Michael Sedlmair, Robert S. Laramee, Han-Wei Shen, and Jian Wu, C. Lee Giles

TL;DR
This paper introduces document domain randomization (DDR), a novel approach that trains CNNs on rendered pseudo-documents to effectively transfer to real-world document segmentation tasks, demonstrating robustness across styles and sample sizes.
Contribution
The paper presents the first successful transfer of CNNs trained solely on rendered pseudo-documents to real-world document segmentation, enabling effective learning without real data.
Findings
DDR achieves competitive results on benchmark datasets.
Style mismatch impacts model accuracy more than label noise.
Smaller training samples slightly reduce performance, but high accuracy persists.
Abstract
We present document domain randomization (DDR), the first successful transfer of convolutional neural networks (CNNs) trained only on graphically rendered pseudo-paper pages to real-world document segmentation. DDR renders pseudo-document pages by modeling randomized textual and non-textual contents of interest, with user-defined layout and font styles to support joint learning of fine-grained classes. We demonstrate competitive results using our DDR approach to extract nine document classes from the benchmark CS-150 and papers published in two domains, namely annual meetings of Association for Computational Linguistics (ACL) and IEEE Visualization (VIS). We compare DDR to conditions of style mismatch, fewer or more noisy samples that are more easily obtained in the real world. We show that high-fidelity semantic information is not necessary to label semantic classes but style mismatch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
