Augraphy: A Data Augmentation Library for Document Images
Alexander Groleau, Kok Wei Chee, Stefan Larson, Samay Maini, Jonathan, Boarman

TL;DR
Augraphy is a Python library that creates realistic distortions in document images, enhancing data diversity for training and testing document image processing models.
Contribution
It introduces a versatile data augmentation library specifically designed for simulating real-world document image distortions.
Findings
Enables generation of diverse augmented document images
Improves robustness testing for document image models
Supports training for document denoising tasks
Abstract
This paper introduces Augraphy, a Python library for constructing data augmentation pipelines which produce distortions commonly seen in real-world document image datasets. Augraphy stands apart from other data augmentation tools by providing many different strategies to produce augmented versions of clean document images that appear as if they have been altered by standard office operations, such as printing, scanning, and faxing through old or dirty machines, degradation of ink over time, and handwritten markings. This paper discusses the Augraphy tool, and shows how it can be used both as a data augmentation tool for producing diverse training data for tasks such as document denoising, and also for generating challenging test data to evaluate model robustness on document image modeling tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · AI in cancer detection · Image Processing and 3D Reconstruction
MethodsLib · Test
