FEDS -- Filtered Edit Distance Surrogate
Yash Patel, Jiri Matas

TL;DR
This paper introduces a robust surrogate for edit distance in scene text recognition, utilizing self-paced learning to filter difficult examples, leading to significant improvements in recognition accuracy across multiple datasets.
Contribution
It presents a novel end-to-end training procedure using a learned surrogate of edit distance with self-paced filtering, enhancing scene text recognition performance.
Findings
Average 11.2% improvement in total edit distance
9.5% reduction in error rate on accuracy
Effective on diverse challenging datasets
Abstract
This paper proposes a procedure to train a scene text recognition model using a robust learned surrogate of edit distance. The proposed method borrows from self-paced learning and filters out the training examples that are hard for the surrogate. The filtering is performed by judging the quality of the approximation, using a ramp function, enabling end-to-end training. Following the literature, the experiments are conducted in a post-tuning setup, where a trained scene text recognition model is tuned using the learned surrogate of edit distance. The efficacy is demonstrated by improvements on various challenging scene text datasets such as IIIT-5K, SVT, ICDAR, SVTP, and CUTE. The proposed method provides an average improvement of on total edit distance and an error reduction of on accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
