Deep Aramaic: Towards a Synthetic Data Paradigm Enabling Machine Learning in Epigraphy
Andrei C. Aioanei, Regine Hunziker-Rodewald, Konstantin Klein, Dominik, L. Michels

TL;DR
This paper introduces a synthetic data generation pipeline for Old Aramaic epigraphy, enabling effective machine learning classification of ancient, degraded inscriptions despite limited real training data.
Contribution
It presents a novel method for creating large, realistic synthetic datasets for Old Aramaic, improving ML-based epigraphy analysis without relying on scarce real data.
Findings
ResNet achieved high accuracy on real ancient inscriptions.
Synthetic data enabled robust classification despite degradation.
Model generalized well across different materials and styles.
Abstract
Epigraphy increasingly turns to modern artificial intelligence (AI) technologies such as machine learning (ML) for extracting insights from ancient inscriptions. However, scarce labeled data for training ML algorithms severely limits current techniques, especially for ancient scripts like Old Aramaic. Our research pioneers an innovative methodology for generating synthetic training data tailored to Old Aramaic letters. Our pipeline synthesizes photo-realistic Aramaic letter datasets, incorporating textural features, lighting, damage, and augmentations to mimic real-world inscription diversity. Despite minimal real examples, we engineer a dataset of 250,000 training and 25,000 validation images covering the 22 letter classes in the Aramaic alphabet. This comprehensive corpus provides a robust volume of data for training a residual neural network (ResNet) to classify highly degraded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Archaeological Research and Protection
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · 1x1 Convolution · Average Pooling · Bottleneck Residual Block · Residual Connection · Max Pooling · Global Average Pooling · Residual Block · Convolution
