Auto-ML Deep Learning for Rashi Scripts OCR
Shahar Mahpod, Yosi Keller

TL;DR
This paper presents an AutoML-based deep learning OCR system tailored for Rashi scripts, achieving over 99.8% accuracy on a large dataset by optimizing CNN architecture and employing book-specific training.
Contribution
It introduces an AutoML approach to optimize CNN architecture for Rashi script OCR, enhancing accuracy with a large annotated dataset.
Findings
Achieved over 99.8% OCR accuracy on Rashi scripts.
Utilized AutoML to optimize CNN architecture for manuscript recognition.
Demonstrated effectiveness on a dataset of over 3 million annotated letters.
Abstract
In this work we propose an OCR scheme for manuscripts printed in Rashi font that is an ancient Hebrew font and corresponding dialect used in religious Jewish literature, for more than 600 years. The proposed scheme utilizes a convolution neural network (CNN) for visual inference and Long-Short Term Memory (LSTM) to learn the Rashi scripts dialect. In particular, we derive an AutoML scheme to optimize the CNN architecture, and a book-specific CNN training to improve the OCR accuracy. The proposed scheme achieved an accuracy of more than 99.8% using a dataset of more than 3M annotated letters from the Responsa Project dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Processing and 3D Reconstruction · Image and Object Detection Techniques
MethodsConvolution
