Enhancing Document AI Data Generation Through Graph-Based Synthetic   Layouts

Amit Agarwal; Hitesh Patel; Priyaranjan Pattnayak; Srikant Panda,; Bhargava Kumar; Tejaswini Kumar

arXiv:2412.03590·cs.CL·December 6, 2024

Enhancing Document AI Data Generation Through Graph-Based Synthetic Layouts

Amit Agarwal, Hitesh Patel, Priyaranjan Pattnayak, Srikant Panda,, Bhargava Kumar, Tejaswini Kumar

PDF

TL;DR

This paper introduces a graph neural network-based method for generating realistic synthetic document layouts, improving data diversity and model performance in Document AI tasks while addressing structural and domain adaptation challenges.

Contribution

The paper presents a novel GNN-based framework for synthetic document layout generation that captures complex structures and enhances Document AI model training.

Findings

01

Graph-based layouts outperform traditional augmentation methods

02

Significant improvements in document classification and NER tasks

03

Proposed solutions mitigate domain adaptation issues

Abstract

The development of robust Document AI models has been constrained by limited access to high-quality, labeled datasets, primarily due to data privacy concerns, scarcity, and the high cost of manual annotation. Traditional methods of synthetic data generation, such as text and image augmentation, have proven effective for increasing data diversity but often fail to capture the complex layout structures present in real world documents. This paper proposes a novel approach to synthetic document layout generation using Graph Neural Networks (GNNs). By representing document elements (e.g., text blocks, images, tables) as nodes in a graph and their spatial relationships as edges, GNNs are trained to generate realistic and diverse document layouts. This method leverages graph-based learning to ensure structural coherence and semantic consistency, addressing the limitations of traditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.