3D Rendering Framework for Data Augmentation in Optical Character Recognition
Andreas Spruck, Maximiliane Hawesch, Anatol Maier, Christian Riess,, J\"urgen Seiler, Andr\'e Kaup

TL;DR
This paper introduces a modular 3D rendering framework for data augmentation in OCR, enhancing datasets with varied viewing angles and lighting to improve recognition accuracy, especially on small datasets.
Contribution
The proposed framework uniquely synthesizes diverse viewing conditions and scales dataset size, applicable to both image and video OCR, outperforming traditional augmentation methods.
Findings
Up to 2.79% improvement in Character Error Rate (CER)
Up to 7.88% improvement in Word Error Rate (WER)
Smaller error rates achieved with augmented small datasets than full original datasets.
Abstract
In this paper, we propose a data augmentation framework for Optical Character Recognition (OCR). The proposed framework is able to synthesize new viewing angles and illumination scenarios, effectively enriching any available OCR dataset. Its modular structure allows to be modified to match individual user requirements. The framework enables to comfortably scale the enlargement factor of the available dataset. Furthermore, the proposed method is not restricted to single frame OCR but can also be applied to video OCR. We demonstrate the performance of our framework by augmenting a 15% subset of the common Brno Mobile OCR dataset. Our proposed framework is capable of leveraging the performance of OCR applications especially for small datasets. Applying the proposed method, improvements of up to 2.79 percentage points in terms of Character Error Rate (CER), and up to 7.88 percentage points…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
