An End-to-End Khmer Optical Character Recognition using Sequence-to-Sequence with Attention
Rina Buoy, Sokchea Kor, Nguonly Taing

TL;DR
This paper introduces an end-to-end deep learning model using sequence-to-sequence with attention for Khmer OCR, significantly improving accuracy over existing methods.
Contribution
It presents a novel Seq2Seq architecture with attention for Khmer OCR, trained on synthetic data, outperforming the Tesseract engine.
Findings
Achieved 1% CER on test set
Outperformed Tesseract OCR by 2% CER
Demonstrated effectiveness of attention mechanism in Khmer OCR
Abstract
This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of residual convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network was trained on a large collection of computer-generated text-line images for seven common Khmer fonts. The proposed model's performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence
