DISGO: Automatic End-to-End Evaluation for Scene Text OCR
Mei-Yuh Hwang, Yangyang Shi, Ankit Ramchandani, Guan Pang, Praveen, Krishnan, Lucas Kabela, Frank Seide, Samyak Datta, Jun Liu

TL;DR
This paper introduces DISGO WER, a comprehensive evaluation metric for scene text OCR that accounts for various error types and uses super blocks for BLEU score computation, improving assessment of OCR systems.
Contribution
It proposes a unified WER-based evaluation framework for scene text OCR, including end-to-end and component performance, and introduces super blocks for automatic BLEU score calculation.
Findings
DISGO WER effectively measures OCR performance on natural scenes.
Super blocks enable automatic BLEU score computation for OCR.
Evaluation on SCUT dataset demonstrates the metric's utility.
Abstract
This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metric, we name it DISGO WER as it considers Deletion, Insertion, Substitution, and Grouping/Ordering errors. Finally we propose to utilize the concept of super blocks to automatically compute BLEU scores for e2e OCR machine translation. The small SCUT public test set is used to demonstrate WER performance by a modularized OCR system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Image Retrieval and Classification Techniques · Vehicle License Plate Recognition
