An End-to-End Khmer Optical Character Recognition using   Sequence-to-Sequence with Attention

Rina Buoy; Sokchea Kor; Nguonly Taing

arXiv:2106.10875·cs.CV·June 22, 2021

An End-to-End Khmer Optical Character Recognition using Sequence-to-Sequence with Attention

Rina Buoy, Sokchea Kor, Nguonly Taing

PDF

Open Access

TL;DR

This paper introduces an end-to-end deep learning model using sequence-to-sequence with attention for Khmer OCR, significantly improving accuracy over existing methods.

Contribution

It presents a novel Seq2Seq architecture with attention for Khmer OCR, trained on synthetic data, outperforming the Tesseract engine.

Findings

01

Achieved 1% CER on test set

02

Outperformed Tesseract OCR by 2% CER

03

Demonstrated effectiveness of attention mechanism in Khmer OCR

Abstract

This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of residual convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network was trained on a large collection of computer-generated text-line images for seven common Khmer fonts. The proposed model's performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence